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PREFACE 


THE methods of progress in theoretical physics have undergone a vast change 
during the Twentieth Century. The classical tradition has been to consider 
the world to be an association of observable objects (particles, fluids, fields, &c.) 
moving about according to definite laws of force, so that one could form a mental 
picture in space and time of the whole scheme. This led to a physics whose 
aim was to make assumptions about the mechanism and forces connecting these 
observable objects, to account for their behaviour in the simplest possible way. 
It has become increasingly evident in recent times, however, that nature works 
on a different plan. Her fundamental laws do not govern the world as it 
appears in our mental picture in any very direct way, but instead they control 
a substratum of which we cannot form a mental picture without introducing 
irrelevancies. The formulation of these laws requires the use of the mathematics 
of transformations. The important things in the world appear as the invariants 
(or more generally the nearly invariants, or quantities with simple transformation 
properties) of these transformations. The things we are immediately aware of are 
the relations of these nearly invariants to a certain frame of reference, usually one 
chosen so as to introduce special simplifying features which are unimportant from 
the point of view of general theory. 

The growth of the use of transformation theory, as applied first to relativity 
and later to the quantum theory, is the essence of the new method in theoretical 
physics. Further progress lies in the direction of making our equations invariant 
under wider and still wider transformations. This state of affairs is very satisfactory 
from a philosophical point of view, as implying an increasing recognition of the part 
played by the observer introducing the regularities that appear in the observations, 
and a lack of arbitrariness in the ways of nature, but it makes things less easy for 
the learner of physics. The new theories, if one looks apart from their mathematical 
setting, are built up from physical concepts which cannot be explained in terms of 
things previously known to the student, which cannot even be explained adequately 
in words at all. Like the fundamental concepts (e.g. proximity, identity) which 
every one must learn on one’s arrival into the world, the newer concepts of physics 
can be mastered only by long familiarity with their properties and uses. 

From the mathematical side the approach to the new theories presents no 
difficulties, as the mathematics required (at any rate that which is required for 
the development of physics up to the ‘early Twentieth Century’) is not essentially 
different from what had been current for a considerable time. Mathematics is 
the tool specially suited for dealing with abstract concepts of any kind and there 
is no limit to its power in this field. For this reason a book on the new physics, 
if not purely descriptive of experimental work, must be essentially mathematical. 
All the same the mathematics is only a tool and one should learn to hold the 
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physical ideas in one’s mind without reference to the mathematical form. In this 
book I have tried to keep the physics to the forefront, by beginning with an entirely 
physical chapter and in the later work examining the physical meaning underlying 
the formalism wherever possible. The amount of theoretical ground one has to 
cover before being able to solve problems of real practical value is rather large, but 
this circumstance is an inevitable consequence of the fundamental part played by 
transformation theory and is likely to become more pronounced in the theoretical 
physics of the future. 

With regard to the mathematical form in which the theory can be presented, 
an author must decide at the outset between two methods. There is the symbolic 
method, which deals directly in an abstract way with the quantities of fundamental 
importance (the invariants, &c., of the transformations) and there is the method 
of co-ordinates or representations, which deals with sets of numbers corresponding 
to these quantities. The second of these has usually been used for the presentation 
of quantum mechanics (in fact it has been used practically exclusively with 
the exception of Weyl’s book Gruppentheorie und Quantenmechanik.) It is known 
under one or other of the two names ‘Wave Mechanics’ and ‘Matrix Mechanics’ 
according to which physical things receive emphasis in the treatment, the states 
of a system or its dynamical variables. It has the advantage that the kind of 
mathematics required is more familiar to the average student, and also it is 
the historical method. 

The symbolic method, however, seems to go more deeply into the nature 
of things. It enables one to express the physical laws in a neat and concise way, and 
will probably be increasingly used in the future as it becomes better understood 
and its own special mathematics gets developed. For this reason I have chosen 
the symbolic method, introducing the representatives later merely as an aid to 
practical calculation. This has necessitated a complete break from the historical 
line of development, but this break is an advantage through enabling the approach 
to the new ideas to be as direct as possible. 

The second half of the book contains applications to all the main fields in 
which quantum mechanics has bee found useful. These applications all follow 
strictly from the general assumptions of the first half, with the exception of those 
of the last chapter, which gives a further theoretical development. 

P. A. M. D. 

ST JOHN’S COLLEGE, CAMBRIDGE 

29 May 1930 
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I. THE PRINCIPLE OF 
SUPERPOSITION 


1. Waves and Particles 


IN the application of classical electrodynamics to atomic phenomena one meets 
with difficulties of a very fundamental nature, which show that the classical theory 
is irreconcilable with the facts. For instance, it is quite hopeless on the basis of 
classical ideas to try to account for the remarkable stability of atoms and molecules 
that is required in order that substances may have definite physical and chemical 
properties. These difficulties have necessitated a modification of some of the most 
fundamental laws of nature and have led to a new system of mechanics, called 
quantum mechanics, since its most surprising (although not its most important) 
differences from the old mechanics apparently show a discontinuity in certain 
physical processes and a discreteness in certain dynamical variables. 

Classical electrodynamics forms a self-consistent and very elegant theory, 
and one might be inclined to think that no modification of it would be possible 
which did not introduce arbitrary features and completely spoil its beauty. This is 
not so, however, since quantum mechanics, after passing through many stages and 
having its fundamental concepts changed more than once, has now reached a form 
in which it can be based on general laws and is, although not yet quite complete, 
even more elegant and pleasing than the classical theory in those problems with 
which it deals. This is brought about by the fact that the changes made in 
the classical theory are very few in number, although they are of a fundamental 
nature and involve the introduction of entirely new concepts, and are such that 
practically all the features of the classical theory to which it owes its attractiveness 
can be taken over unchanged into the new theory. 

The necessity for a fundamental departure from the laws and concepts of 
classical mechanics is seen most clearly by a consideration of experimentally 
established facts on the nature of light. On the one hand the phenomena of 
interference and diffraction can be explained only on the basis of a wave theory 
of light; on the other, phenomena such as photo-electric emission and scattering 
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by free electrons show that light is composed of small particles, which are called 
photons, each having a definite energy and momentum depending on the frequency 
of the light. These photons appear to have just as real an existence as electrons, 
or any other particles known in physics. A fraction of a photon is never observed, 
so that we may safely assume it cannot exist. 

To obtain a consistent theory of light which shall include interference and 
diffraction phenomena, we must consider the photons as being controlled by waves, 
in some way which cannot be understood from the point of view of ordinary 
mechanics. This intimate connexion between waves and particles is of very great 
generality in the new quantum mechanics. It occurs not only in the case of light. 
All particles are connected in this way with waves, which control them and give rise 
to interference and diffraction phenomena under suitable conditions. The influence 
of the waves on the motion of the particles is less noticeable the more massive 
the particles and only in the case of photons, the lightest of all particles, is it 
easily demonstrated. 

The waves and particles should be regarded as two abstractions which are 
useful for describing the same physical reality. One must not picture this reality as 
containing both the waves and particles together and try to construct a mechanism, 
acting according to classical laws, which shall correctly describe their connexion 
and account for the motion of the particles. Any such attempt would be quite 
opposed to the principles by which modern physics advances. What quantum 
mechanics does is to try to formulate the underlying laws in such a way that 
one can determine from them without ambiguity what will happen under any 
given experimental conditions. It would be useless and meaningless to attempt 
to go more deeply into the relations between waves and particles than is required 
for this purpose. 


2. The polarization of photons 


Although the idea of a physical reality being describable by both particles 
and waves, which are connected in some curious manner, is of far-reaching 
importance and wide applications, yet it is only a special case of a much 
more general principle, the Principle of Superposition. This principle forms 
the fundamental new idea of quantum mechanics and the basis of the departure 
from the classical theory. In order to lead up to an explanation of this 
principle, we shall first take a very simple special case of it, which is provided 
by a consideration of the polarization of light. It is known experimentally 
that when plane-polarized light is used for ejecting photo-electrons, there is 
a preferential direction for the electron emission. Thus the polarization properties 
of light are closely connected with its corpuscular properties and one must ascribe 
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a polarization to the photons. One must consider, for instance, a beam of light 
plane polarized in a certain direction as consisting of photons each of which is plane 
polarized in that direction and a beam of circularly polarized light as consisting of 
photons each circularly polarized. Every photon is in a certain state of polarization, 
as we shall say. The difficulty is now how we are to fit in these ideas with the known 
facts about the resolution of light into polarized components and the recomposition 
of these components. 

Suppose, for instance, that we have a beam of plane-polarized light passing 
through a polariscope and getting resolved into two components polarized at 
angles of a and a + $7 with the direction of polarization of the incident beam. 
The intensities of the two components will be, according to classical optics, 
respectively cos?a and sin?a@ times that of the original beam. Let us say that 
a photon of the original beam is in the state of polarization 0 and a photon in one 
or other of the two components is in the state a or a+47 respectively. The question 
that now arises is: What must we consider happens to each individual photon when 
it reaches the polariscope? How do the photons in the state 0 change into photons 
in the states a and a+ 4n? 

This question cannot be answered without the help of an entirely new concept 
which is quite foreign to classical ideas. We shall therefore first consider 
another question of a different type, namely, what will be the result of any 
particular experiment which one may perform to try to determine what happens to 
an individual photon when it reaches the polariscope. It is only questions of this 
type that are really important, and quantum mechanics always gives a definite 
answer to them. Any answer that may be given to our first question, i.e. any 
description of the whole course of a photon during the experiment, would be simply 
a device to help us to remember the results of the experiments. We ought not to be 
surprised if no such description based on classical ideas is possible. 

The most direct experiment of this kind would be to use an incident beam 
consisting of only a single photon and then to measure the energy in each of the 
two components. The result predicted by quantum mechanics is that sometimes 
one would find the whole of the energy in one component and the other times 
one would find the whole in the other component. One would never find part of 
the energy in one and part in the other. Experiment can never reveal a fraction 
of a photon. If one did the experiment a large number of times, one would find 
in a fraction cos?a@ of the total number of times that the whole of the energy 
is in the a-component and in a fraction sin? a that the whole of the energy is 
in the (a + 47)-component. One may thus say that a photon has a probability 
cos? a of appearing in the a-component and a probability sin? a of appearing in 
the (a + $7)-component. These values for the probabilities lead to the correct 
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classical distribution of energy between the two components when the number of 
photons in the incident beam is large. 

Thus the individuality of the photon is preserved in all cases, but only at 
the expense of determinacy. The result of an experiment is not determined, as it 
would be according to the classical theory, by the conditions under the control 
of the experimenter. The most that can be predicted is the probability of 
occurrence of each of the possible results. This lack of determinacy, which 
runs through the whole of quantum mechanics and is in stark contradiction to 
the classical theory, may at first sight appear to be unsatisfactory, as implying 
a departure from the law of causality. It should be remarked, though, that 
if one makes any experimental arrangement to observe the energy of one of 
the components (e.g. by reflection by a movable mirror and measurement of 
the recoil momentum communicated to the mirror), it will always be impossible 
subsequently to recombine the two components to produce interference effects. 
The observation must inevitably produce, as we shall see from the general 
laws of quantum mechanics, a change in phase of uncertain and unpredictable 
amount. One may therefore, as has been pointed out by Niels Bohr,* ascribe 
the lack of determinacy in the result to the uncertainty in the disturbance which 
the observation necessarily makes, although one cannot inquire closely into how it 
comes about. The apparent failure of causality is from this point of view due to 
a theoretically necessary clumsiness in the means of observation. 

We must now consider the answer to our first question and give a description 
of the photon throughout the course of the experiment. A description consisting 
of a continuous picture in the classical sense is not possible. The description which 
quantum mechanics allows us to give is merely a manner of speaking which is of 
value in helping us to deduce and to remember the results of experiments and 
which never leads to wrong conclusions. One should not try to give too much 
meaning to it. 

It is necessary to suppose a peculiar relationship to exist between the different 
states of polarization, which is such that when, for instance, a photon is in 
the state 0, it may be considered as being partly in the state a and partly 
in the state a + 47. Similarly it could be considered as partly in state 6 and 
partly in state 6 +47, where @ is any other angle of polarization, or as partly in 
the state of left-circular polarization and partly that of right-circular polarization. 
More generally, one could consider it partly in each of two states plane polarized 
in two directions that are not at right angles, though this is seldom convenient, 
or one could consider it partly in each of more than two states. There are thus 
many ways of describing the photon, which are all always permissible and equally 





*See the article by Bohr, N. The Quantum Postulate and the Recent Development of Atomic 
Theory! Nature 121, 580-590 (1928). https: //doi.org/10.1038/121580a0 
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good theoretically, although, of course, the one that says the photon is entirely 
in state 0 is simpler than those that say it is ‘distributed’ over two or more 
states. When we say that the photon is distributed over two or more given states 
the description is, of course, only qualitative, but in the mathematical theory 
it is made exact by the introduction of numbers to specify the distribution, which 
determine the weights with which the different states occur in it. 

One cannot picture in detail a photon being partly in each of two states; still less 
can one see how this can be equivalent to its being partly in each of two other 
different states or wholly in a single state. We must, however, get used to the new 
relationships between the states which are implied by this manner of speaking and 
must build up a consistent mathematical theory governing them. 

In our polarizing experiment, if we choose to consider the incident photon as 
being partly in state a and partly in state a + 47, the action of the polariscope is 
then quite simple. It separates the two components a and a+47 into two distinct 
beams, so that after the photon has passed through we must say that it is partly 
in one beam with the polarization a and partly in the other with the polarization 
a-+47. There is now no way of saying the photon is wholly in one state, without 
a generalization of the meaning of a state, which will be made later. The simplest 
description is the one just given, in which the photon is distributed over two 
states. Other possible descriptions would require the photon to be distributed 
over three or more states; e.g. one could say it is partly in the first beam with the 
polarization a, partly in the second beam with the polarization 3 (arbitrary), and 
partly in the second beam with the polarization 6 + 47. Such descriptions would 
not, however, be of value unless the beams were subsequently passed through other 
polarizing instruments. 

Let us consider now what happens when we determine the energy in one of the 
components. The result of such a determination must be either the whole photon 
or nothing at all. Thus the photon must change suddenly from being partly in 
one beam and partly in the other to being entirely in one of the beams. This 
sudden change may be counted as due to the disturbance of the photon which 
the observation necessarily makes. It is impossible to predict in which of the two 
beams the photon will be found. Only the probability of either result can be 
calculated from the previous distribution of the photon over the two beams. 

This way of describing the photon during the course of the experiment leads to 
one important conclusion, namely, the above-mentioned circumstance that when 
once the energy in one of the components has been determined, it will be impossible 
subsequently to bring about interference between the two components. When 
the photon is partly in one beam and partly in the other, if the two beams are 
superposed interference can take place, as the mathematical theory will show. This 
possibility disappears when the photon is forced entirely into one of the beams by 
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the energy observation. The other beam then no longer enters into the description 
of the photon, so that if any experiment is subsequently performed on the same 
photon it will count as being entirely in the one beam in the ordinary way. 

We have obtained a description of the photon throughout the experiment, which 
rests on a new rather vague idea of a photon being partly in one state and partly in 
another. The reader may, perhaps, feel that we have not really solved the difficulty 
of the conflict between the waves and the corpuscles, but have merely talked about 
it in a certain way and, by using some of the concepts of waves and some of 
corpuscles, have arrived at a formal account of the phenomena, which does not 
really tell us anything that we did not know before. The difficulty of the conflict 
between the waves and corpuscles is, however, actually solved as soon as one can 
give an unambiguous answer to any experimental question. The only object of 
theoretical physics is to calculate results that can be compared with experiment, 
and it is quite unnecessary that any satisfying description of the whole course of 
the phenomena should be given. 

With regard to the objection that the present description does not seem to take 
us any farther than we could, perhaps, have gone with very hazy notions of the 
relations between photons and electromagnetic waves, such as, for instance, those 
one had before the discovery of quantum mechanics, it should be remarked that 
the conclusion obtained above, that when once the energy of one of the beams has 
been measured subsequent interference between the beams would be impossible, 
could not have been drawn from very hazy notions, and also that the present 
discussion is really too qualitative for the advantages of the new theory to show 
up clearly. In §5 the discussion on the nature of light will be renewed on a slightly 
more quantitative basis, which will bring out definitely the difference between 
the present theory and the previous hazy notions. For many elementary optical 
experiments, moreover, the hazy notions would suffice to give answers to questions 
concerning the results of observations and in such cases quantum mechanics would 
not give any further information. The object of quantum mechanics is to extend 
the domain of questions that can be answered and not to give more detailed answers 
than can be experimentally verified. 


3. Superposition and Indeterminacy 


The new ideas that we have introduced in our description of the photon must be 
extended and applied to any atomic system, i.e. to any set of electrons and atomic 
nuclei interacting with each other and perhaps also with photons. We must first 
generalize the meaning of a ‘state’ so that it can apply to any atomic system. 
Corresponding to the case of the photon, which we say is in a given state of 
polarization when it has been passed through suitable polarizing apparatus, we say 


3. Superposition and Indeterminacy 7 


that any atomic system is in a given state when it has been prepared in a given way, 
which may be repeated arbitrarily at will. The method of preparation may then 
be taken as the specification of the state. The state of a system in the general 
case includes any information that may be known about its position in space 
from the way in which it was prepared, as well as any information about its 
internal condition. 

We must now imagine the states of any system to be related in such a way 
that whenever the system is definitely in one state, we can equally well consider 
it as being partly in each of two or more other states. The original state must 
be regarded as the result of a kind of superposition of the two or more new 
states, in a way that cannot be conceived on classical ideas. Any state may 
be considered as the result of a superposition of two or more other states, and 
indeed in an infinite number of ways. Conversely any two or more states may be 
superposed to give a new state, even also when they refer to different positions of 
the system in space. Thus in our previous example of the polarization experiment, 
when the photon is partly in the one beam with the polarization a and partly in 
the other with the polarization a + 47, we may still count it as being entirely in a 
certain single state. In fact it still satisfies the definition of having been prepared 
in a definite way which may be repeated at will. 

When a state is formed by the superposition of two other states, it will 
have properties that are in a certain way intermediate between those of the two 
original states and that approach more or less closely to those of either of them 
according to the greater or less ‘weight’ attached to this state in the superposition 
process. The new state is completely defined by the two original states when their 
relative weights in the superposition process are known, together with a certain 
phase difference, the exact meaning of weights and phases being provided in 
the general case by the mathematical theory of the next chapter. In the case 
of the polarization of a photon their meaning is that provided by classical optics, 
e.g. when two perpendicularly plane polarized states are superposed with equal 
weights, the new state may be circularly polarized in either direction, or linearly 
polarized at an angle 47, or else elliptically polarized, according to the phase 
difference. This, of course, is true only provided the two states that are superposed 
refer to the same beam of light, 7.e. all that is known about the position and 
momentum of a photon in either of these states must be the same for each. 

It is convenient at this stage to modify slightly the meaning of the word 
‘state’ and to make it more precise. We must regard the state of a system as 
referring to its condition throughout an indefinite period of time and not to its 
condition at a particular time, which would make the state a function of the 
time. Thus a state refers to a region of 4-dimensional space-time and not to a 
region of 3-dimensional space. A system, when once prepared in a given state, 
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remains in that state so long as it remains undisturbed. This does not, of course, 
imply that it is not undergoing changes which could be revealed by experiment. 
In general it will be following out a definite course of changes, predictable by 
the quantum theory, belonging to that state. It is sometimes purely a matter of 
convenience whether we are to regard a system as being disturbed by a certain 
outside influence, so that its state gets changed, or whether we are to regard the 
outside influence as forming part of and coming in the definition of the system, 
so that with the inclusion of the effects of this influence it is still merely running 
through its course in one particular state. An illustration of this is our previous 
example of a photon being passed through a polariscope and becoming partly in 
each of two beams. Either we may consider the polariscope as disturbing the 
photon, so that after it has passed through it is in a different state; or else we may 
consider the polariscope as forming part of the ‘field’ in which the photon is moving, 
so that it is in the same state when it is in the incident beam as later when it is 
partly in each of the two component beams, and it is just following out its course in 
that state. The general laws of quantum mechanics apply equally well for either of 
these meanings of the state. There are, however, two cases when we are in general 
obliged to consider the disturbance as causing a change in state of the system, 
namely, when the disturbance is an observation and when it consists in preparing 
the system so as to be in a given state. 

With the new space-time meaning of a state we need a corresponding space-time 
meaning of an observation. This requires that the specification of an observation 
shall include a definite time at which the observation is to be made, or at which 
the apparatus used in making the observation is to be set in motion, relatively to 
the time when the system was prepared. It should be noticed that it has a meaning 
to consider an observation being made on a system in a given state before this state 
is prepared. If the system is prepared at time to, so that after time fo it is in a given 
state, we can imagine what it would have to be like before time tp in order that, 
if left undisturbed, it may become in the given state after time to. Thus we can 
imagine the given state being produced backwards in time and can give a meaning 
to an observation being made before time tp on the system in this state. 

The introduction of indeterminacy into the results of observations, which 
we had to make in our discussion of the photon, must now be extended to 
the general case. When an observation is made on any atomic system that has 
been prepared in a given way and is thus in a given state, the result will not 
in general be determinate, i.e. if the experiment is repeated several times under 
identical conditions several different results may be obtained. If the experiment 
is repeated a large number of times it will be found that each particular result 
will be obtained a definite fraction of the total number of times, so that one can 
say there is a definite probability of its being obtained any time the experiment 
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is performed. This probability the theory enables one to calculate. In special 
cases this probability may be unity and the result of the experiment is then 
quite determinate. 

The indeterminacy in the results of observations is a necessary consequence of 
the superposition relationships that quantum mechanics requires to exist between 
the states. Suppose that we have two states A and B such that there exists 
an observation which, when made on the system in state A, is certain to lead to 
one particular result, and when made on the system in state B, is certain not 
to lead to this result. Two such states we call orthogonal. Suppose now that this 
observation is made on the system in a state formed by superposition of A and B. 
It is impossible for the result still to be determinate (except in the special case 
when the weight of A or B in the superposition process is zero). There must 
be a finite probability p that the result, that was certain for state A, will now be 
obtained and a finite probability 1—p that it will not be obtained. By continuously 
varying the relative weights in the superposition process we can get a continuous 
range of states, extending from pure A to pure B, for which the probability of 
the result, that was certain for state A, being obtained varies continuously from 
unity to zero. 

It was mentioned above that an observation is not specified unless the time 
when it is made is given. In special cases it may so happen that the result 
of the observation, or the probability of any particular result being obtained, 
is independent of this time. If the state of the system is such that this is so 
for every observation that could be made on the system, then the state is said 
to be a stationary state and we should picture it as one in which the conditions 
are not varying. 

The possibility in quantum mechanics of superposing states to get new states 
is connected with the fact that in the mathematical theory the equations that 
define a state are linear in the unknowns. It is not unnatural that one should 
try to establish analogies with systems in classical mechanics (such as vibrating 
strings or membranes), which are governed by linear equations and for which, 
consequently, a superposition principle holds. Such analogies have led to the 
name ‘Wave Mechanics’ being sometimes given to quantum mechanics. It must be 
emphasized, however, that the superposition that occurs in quantum mechanics 
is of an essentially different nature from that occurring in the classical theory. 
The analogies are therefore very misleading. ‘Their inadequacy may be seen 
from the following special case. Suppose one compares the states of an atomic 
system with the states of vibration of a membrane. If one superposes any state 
of the vibrating membrane with itself, the result is a new state of double the 
amplitude. On the other hand, if one superposes an atomic state with itself 
according to quantum mechanics, the resulting state will be precisely the same 


10 I. THE PRINCIPLE OF SUPERPOSITION 


as the original one. There is nothing in the atomic case that is analogous to 
the absolute value of the amplitude, as distinct from the relative amplitudes of 
different points, of the vibrating membrane. 


4. Compatibility of Observations 


In general a system is disturbed when an observation is made on it, so that after 
the observation it is no longer in the same state as before. Only when the initial 
state and the observation are such that there is a probability unity, i.e. a certainty, 
for one particular result is it possible that the observation may produce no change 
of state. The necessity for this conclusion may be seen from the following argument. 

Suppose that there is a probability p for a given result being obtained from 
the observation. Consider one occasion on which this result was actually found 
and suppose the observation was repeated immediately afterwards on the system 
in the state in which it was left by the first observation. There must have been 
a probability unity for the given result being obtained a second time, since we may 
assume the system could not have changed in the infinitely short time between the 
two observations. Thus while the first state is such that there is a probability p for 
a given result from a certain observation, the second state (i.e. the one in which 
the system was left by the first observation) is such that there is a probability unity 
for this same result from a practically equivalent observation. Hence the second 
state must differ from the first when p differs from unity, since the probability of 
a result is quite definite for each state. It must be understood that the second 
state here considered is the one that arose on that particular occasion referred to 
above when the first observation was found to give the particular result desired. 
There will be a different second state corresponding to each different result for 
this observation. They must all be different from the initial state when p differs 
from unity. 

Hence when once an observation of a system in a given state has been made, 
one cannot in general make a second observation and suppose it to apply to 
the same state. The first observation spoils the state of the system, which must 
then be prepared again before one can make the second. The two observations 
may, however, be such that, although the first one alters the state of the system, 
yet it does so in such a way as not to make any difference to the probability of any 
given result being obtained with the second. By the probability of a given result 
being obtained with the second is here meant its probability at the beginning 
of the experiment, before one knows what the result of the first observation is, 
and not its probability after a particular result has been obtained with the first 
observation. Two observations for which this is so when they are made (or at least 
when the first is made) with the minimum of disturbance allowed by theory, which 
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can be attained in practice only under the most favourable conditions, are called 
compatible. Three or more observations are called compatible when any two are 
compatible. Two or more observations may be compatible only with respect to 
one particular state as initial state before any of the observations, or they may 
be compatible with respect to all initial states. In future when it is said that two 
or more observations are compatible, the second alternative is to be understood 
unless the contrary is stated. 

The condition for the compatibility of two observations is, according to the laws 
of quantum mechanics, a symmetrical condition between them. If one of two 
compatible observations, a, say, is made at the time t, and the other, ag say, 
at the time ty which is later than t,, then, according to the definition given above, 
the probability of a given result being obtained for a2 must be the same whether 
this observation is made on the system in the initial state or in the state ensuing 
after observation a2. The symmetry condition now requires that the probability of 
a given result being obtained for a; must be the same whether this observation a4 
is made on the system in the initial state or in the state ensuing after observation 
Q2, it being necessary to suppose this latter state, which is prepared at time fo, 
to be produced backwards in time, in the way mentioned in the preceding section, 
in order that the observation a, at time t; may be made on it. By the probability 
of a result for the state ensuing after a certain observation, is meant in each case 
the average probability for each state that can ensue after this observation, each 
of these states being weighted in the averaging process with the probability that 
it does ensue after this observation. 

It has been pointed out that the state of a system after any observation has 
been made on it is such that this observation, if made on the system in this final 
state, would for a certainty give one particular result. Suppose now that a number 
of compatible observations a1, @2,... are made on the system. Then the final state 
must be such that, if any of the observations a, is made on the system in this final 
state, there will be a certainty for one particular result, since there was a certainty 
for one particular result as soon as the observation a, was made in the preparation 
of the final state, and this will not be affected by the subsequent observations a,,.+1, 
Q,42,-.., Owing to the compatibility condition. The existence of states for which 
the result of any of the observations is a certainty forms one of the main properties 
of compatible observations. The order of the observations necd not, of course, be 
their order in time, since we are allowed to consider an observation being made on 
a state before it is prepared. 

The case of greatest interest of the compatibility of two observations is when 
they both refer to the same instant of time. The compatibility condition is now 
that if either is made a very short time before the other, the probability of any 
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given result being obtained with the second shall be the same as if the first had 
not becn made. 

It is often convenient to count two or more compatible observations, 
particularly when they are simultaneous, as a single observation, the result 
of such an observation being expressible by two or more numbers. We shall 
frequently have to consider the greatest possible number of independent compatible 
simultaneous observations being made on a system and shall, for brevity, call such 
a set of observations a maximum observation. When a maximum observation is 
made on a system, its subsequent state is completely determined by the result of 
the observation and is independent of its previous state. ‘This may be considered 
as an axiom, or as a more precise definition of a state. 

The state of a system after a maximum observation has been made on it is 
such that there exists a maximum observation (namely, an immediate repetition 
of the maximum observation already made) which, when made on the system in 
this state, will for a certainty lead to one particular result (namely, the previous 
result over again). Any state can be specified only as the state ensuing after a given 
maximum observation has been made for which a given result was obtained, or 
in some equivalent way. We can therefore draw the conclusion that for any state 
there must exist one maximum observation which will for a certainty lead to one 
particular result, and conversely, if we consider any possible result of a maximum 
observation, there must exist a state of the system for which this result for the 
observation will be obtained with certainty. 


5. Further Discussion on Photons 


When quantum mechanics is applied to a system composed of simply a freely 
moving corpuscle, the equations that define a state of the system are, as we shall 
find from the mathematical theory, the ordinary equations for wave motion. It is 
this circumstance that gives to the corpuscle many of the properties of waves and 
allows us to consider a corpuscle in a given state as associated with, or controlled 
by, a given wave. In order to show more definitely the nature of the relations 
between the waves and the corpuscle, a typical example will be given of the conflict 
between the wave and the corpuscular theories of light and of the solution which 
quantum mechanics provides. 

Consider a beam of light to be split into two components of equal intensity, 
which are made to interfere. According to the old corpuscular theory we would 
say that each of the two components contains an equal number of photons and 
we should then require that a photon in one component could interfere with one in 
the other. Under certain conditions they would have to annihilate one another, and 
under others to produce four photons. This contradicts the idea of photons being 
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discrete particles and is, besides, in disagreement with the conservation of energy, 
which should hold for each process in detail and not be merely statistically true. 

The answer that quantum mechanics gives to the difficulty is that one should 
consider each photon to go partly into each of the two components, in the 
way allowed by the idea of the superposition of states. Each photon then 
interferes only with itself. Interference between two different photons can never 
occur. The solution of Maxwell’s equations that forms the wave picture of the 
phenomenon represents one of the photons and not the whole assembly of photons. 
The relative intensities that this solution gives for the light at different points 
determine the relative probabilities of that photon being found at these points 
when an experiment is made to find its position. Only the relative intensities at 
different points are of importance; the absolute intensity has no interpretation. 
One must not try to establish any connexion between the absolute intensity of the 
waves and the total number of particles, which is in stark constrast' to the older 
ideas of the relations between waves and particles. 

The quantum-mechanical views do not, of course, get over the difficulty of 
enabling us to picture something having properties between those of waves and 
corpuscles, but they serve to remind us, by their way of saying a photon is 
partly in one component and partly in the other, of the close connexion between 
the components and so prevent us from intuitively drawing wrong conclusions, 
as we do on the older views when we picture each component as having its 
own photons. For instance, we are reminded, by the requirement that the total 
probability of a photon being anywhere must be and must remain unity, that in 
whatever way the two component beams interfere, if they neutralize each other 
in one place they must reinforce each other in another so that conservation of 
energy is preserved. We thus get into no difficulty with the detailed conservation 
of energy. 


6. Definition of Superposition 


A definition of the superposition of states will now be given. We say that a state 
A may be formed by a superposition of states B and C' when, if any observation is 
made on the system in state A leading to any result, there is a finite probability for 
the same result being obtained when the same observation is made on the system 
in one (at least) of the two states B and C. The Principle of Superposition says 
that any two states B and C' may be superposed in accordance with this definition 
to form a state A and indeed an infinite number of different states A may be formed 
by superposing B and C’ in different ways. This principle forms the foundation 





tOriginal:- sharp distinction 


14 I. THE PRINCIPLE OF SUPERPOSITION 


of quantum mechanics. It is completely opposed to classical ideas, according to 
which the result of any observation is certain and for any two states there exists 
an observation that will certainly lead to two different results. 

From our definition of superposition some elementary theorems follow 
immediately. For example, the states B and C’ themselves are particular cases of 
states formed by superposition of B and C. Again, if we superpose two states A and 
B obtaining a state P, which is then superposed on another state C,, the resulting 
state Q will have the property that, if any observation is made on the system in 
this state leading to any result, there will be a finite probability of this same result 
being obtained when the observation is made on the system in one of the two 
states P and C, and hence there must be a finite probability of this result being 
obtained when the observation is made on the system in one of the three states A, 
Band C. Thus the property possessed by the state Q is symmetrical in the three 
states A, B and C, so that when superpositions are made successively their order 
is unimportant. This, of course, is necessary for the word ‘superposition’ to be 
suitable for describing the relations between the states. 

Another example of a deduction from the definition of superposition is 
the following: If an observation of the system in a state A is certain to lead to one 
particular result and if this observation for another state B is certain to lead to 
the same result, then the observation is also certain to lead to this result for any 
state obtained by superposition of A and B. This is because it cannot lead to any 
other result, as the probability of this other result for both the states A and B 
is zero. 

One could proceed to build up the theory of quantum mechanics on the basis of 
these ideas of superposition with the introduction of the mimmum number of new 
assumptions necessary. Although this would be the logical line of development, 
it does not appear to be the most convenient one, as the laws of quantum mechanics 
are so closely interconnected that it would not be easy, and would in any case 
be somewhat artificial, to separate out the barest minimum of assumptions from 
which the rest could be deduced. The method that will be here followed will 
therefore be first to give all the simple general laws in the form in which they are 
most easily expressed and remembered, and then to work out their consequences. 
This will mean that we shall continually be deducing results that are obviously 
necesary for the physical meaning of the theory to be tenable, or that follow from 
the foregoing ideas of superposition. Such deductions wll then merely show the 
reasonableness and self-consistency of our fundamental assumptions. 


Il. SYMBOLIC ALGEBRA OF 
STATES AND OBSERVABLES 


7. Addition of States 


We introduce certain symbols which we say denote physical things such as states 
of a system or dynamical variables. These symbols we shall use in algebraic 
analysis in accordance with certain axioms which will be laid down. To complete 
the theory we require laws by which any physical conditions may be expressed 
by equations between the symbols and by which, conversely, physical results 
may be inferred from equations between the symbols. <A typical calculation 
in quantum mechanics will now run as follows: One is given that a system 
is in a certain state in which certain dynamical variables have certain values. 
This information is expressed by equations involving the symbols that denote 
the state and the dynamical variables. From these equations other equations are 
then deduced in accordance with the axioms governing the symbols and from 
the new equations physical conclusions are drawn. One does not anywhere specify 
the exact nature of the symbols employed, nor is such specification at all necessary. 
They are used all the time in an abstract way, the algebraic axioms that they satisfy 
and the connexion between equations involving them and physical conditions being 
all that is required. The axioms, together with this connexion, contain a number 
of physical laws, which cannot conveniently be analysed or even stated in any 
other way. 

We denote each state of a dynamical system by a symbol w. Different states 
may be distinguished by suffixes, e.g. v1, We, v3. If a state Wo, may be formed by 
superposition of the states w, and we, we express this relation between the states 
by an equation of the type 

Wo = C1V1 + Cato, (1) 


where c, and co are numbers, which may be imaginary or complex. The different 
states that may be formed by the superposition of q, and w2 are given by different 
coefficients cj, co. Any two w-symbols denoting any two states may be added in 
this way with arbitrary coefficients cy and cp and the sum will always be another 
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w-symbol denoting a state that can be formed by superposition of these two states, 
except in the special case when this sum is zero. The usual algebraic axioms of 
addition are assumed to hold, i.e. the commutative axiom 


C11 + Coe = cole +144 


and the associative axiom 


(cr + Coe) + e343 = c1h1 + (cate + c3~/3). 


The first of these axioms implies that superposition of two states is a symmetrical 
process between them, which is obvious from the definition of §6, while the second 
implies the theorem, which was proved in §6, that in successive superpositions 
the order is unimportant. 

Our assumptions so far are thus consistent with the definition of superposition. 
They do, however, go farther than this definition and contain new physical laws. 
For example, we can infer that if the state ~ may be formed by superposition of 
yw and 2 so that equation (1) holds, then (provided c,; 4 0) y may be formed 
by superposition of YW and wW 2. The condition of superposition (1) is, in fact, 
symmetrical between Wo, w, and wo. This could not have been deduced from 
the definition of superposition in 86. When three states are symmetrically related 
in this way, we say that they are dependent. We can extend the definition and say 
that any number of states w1, wWe,..., Wn are dependent or independent according 
to whether there is or is not a relation between them of the type 


Cy + Cota + +++ + Cnn = 0. (2) 





It has been mentioned that when a state is superposed on itself, the resulting 
state is the same as the original one. Thus our symbolic scheme should be such 
that 1+ 1 or 2y denotes the same state as w,. Actually we make a more general 
assumption than this, namely, that cy; denotes the same state as ~, where c is any 
number, not zero, and can be imaginary or complex. The nature of the connexion 
between the states and the symbols w required by this assumption may perhaps 
be more easily understood if one pictures the w’s as vectors in some space with 
a sufficiently large number of dimensions. The number of dimensions required is 
equal to the number of independent states that the system has, which is in general 
infinite. An equation of the type (1) or (2) can now be regarded as a vector 
equation. The vectors are, of course, in general complex. A state must now be 
considered as completely specified by the direction of a vector. Vectors of different 
lengths and the same direction specify the same state. 

We now introduce another set of symbols ¢, ¢2,... also denoting states. Any 
state denoted by a w~-symbol w, can be equally well denoted by a ¢-symbol ¢, 
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having the same suffix. When the w’s that denote three states satisfy (1), the ¢’s 
that denote these states are assumed. to satisfy 


do = G1 + Coho, (3) 


where the bar over a number denotes its conjugate complex. The @’s are also 
assumed to satisfy the commutative and associative laws of addition and to have 
all the other properties that the w’s have, e.g. cd, denotes the same state as @y1, 
and we may define a number of states denoted by @1, ¢2,..., dn to be independent 
when there is no relation between them of the type 


C191 + Copa + +++ + Cndn = 0. 





The theory will throughout be symmetrical between the ¢’s and w’s. The sum of 
a @ and aw has no meaning and will never appear in the analysis. 

The introduction of a second set of symbols to denote the states may appear to 
be superfluous, but actually it is necessary when one allows complex coefficients c,. 
in order to preserve the symmetry between the two roots of —1. A superposition 
process such as (1), which is specified by the two complex numbers c; and c2, must 
be equally well specifiable by the conjugate complex numbers ¢; and ¢z so that we 
are obliged to introduce equation (3) and treat it on the same footing as (1). 

We have seen that a ¢- or ~-symbol may be multiplied by an arbitrary number 
and then still denotes the same state. Thus we can put 


Ur = ArPr Qs = bsOs; (4) 


where the a’s and b’s are arbitrary numbers, not zero, and consider the w*’s and 
o*’s as denoting the states instead of the w’s and @’s. The a’s and b’s must, 
however, satisfy certain conditions in order that the connexion between equations 
(1) and (3) may hold also for the starred symbols. These equations give! 


Wo = (c1a1/a0) Yj + (C2G2/a0)v9, 
6 = (1b1/a0)o} + (Cab2/bo) 3. 


In order that the coefficients in the ¢* equation may be conjugate complex to the 
coefficients in the ~* equation we must have 


b; /bo = a /Go bo/bo = G2/Ao 
Hence De = fa; (5) 


where f is a number independent of r. 





Dirac uses a ‘.’ to separate two factors when bracketed juxtaposition would be more clear. 
Later the ‘.’ is replaced by a ‘.’. 
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The connexion between equations (1) and (3), and the condition (5) governing 
the most general transformation (4) that preserves this connexion, lead one 
to consider each ¢, as being proportional to the conjugate imaginary quantity of 
the corresponding w,, the proportionality becoming an equality if a transformation 
of the type (4) & (5) is applied with the correct value for f. Thus if we adopt 
the vector picture of the w’s we may take each w, to be the conjugate imaginary 
vector to the corresponding @,. It should be remarked, though, that the conjugate 
imaginariness of the w’s and @’s is not of quite the same nature as that of ordinary 
complex numbers, since we cannot give any meaning to the splitting up of a w 
into its real and imaginary? parts. In the splitting up of an ordinary complex 
quantity into its real and imaginary parts, we obtain the real part by taking 
the average of the quantity itself and its conjugate imaginary, but we cannot 
do this for a w-symbol since we are not allowed to add together a w and a @. 
Thus the relation between a w and the corresponding ¢ is not quite the same 
as the relation between two conjugate imaginary numbers, and in order that 
this difference may be remembered we shall reserve the words conjugate imaginary 
for describing relations between w’s and @’s and use the words conjugate complex 
instead for quantities such as numbers which can be split up into real and imaginary 
parts. Ordinary vectors, of course, like numbers, can be split up into real and 
imaginary parts, so that the picturing of w’s and @’s as vectors is not strictly 
correct, although it is all the same sometimes useful. We must therefore remember, 
when using the vector picture, that, in so far as it would allow one to add together 
two vectors representing a w and a @ respectively, it is imperfect and gives to 
the w’s and ¢’s more properties than quantum mechanics requires or allows. 


8. Multiplication of States 


Up to the present the only functions of the w’s and @’s that we have allowed 
are linear functions of the wW’s alone, or of the @’s alone, with numerical 
coefficients. We now suppose that any w and @ have a product, which is a number, 
in general complex. This product must always be written ow, i.e. the d must be 
on the left-hand side and the w on the right. Products such as wW@, WW. and ¢1¢2 
have no meaning and will never appear in the analysis. 

The products @¢yW are assumed to satisfy the distributive axiom of 
multiplication, i.e 


(6) 





(1 + bo) = dip + day, 
(di + Yo) = oY + oye, 





'The ‘pure’ is omitted by being unnecessary. 
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together with the axiom that 
oc) = (cb)v = c(oy), (7) 


where c is any number. In the vector picture we can take the number dw to be the 
scalar product of the two vectors ¢ and w. The conditions (6) and (7) are then 
satisfied. The vector picture, however, allows us also to form the products ¢,¢2 
and wu. Thus we again find the vector picture giving more properties to the w’s 
and @’s than required in quantum mechanics. 

In conformity with our view of regarding a w and the corresponding @ as 
conjugate imaginary quantities, we now make the following two assumptions: 


orbs = brs (8) 
dribr > 0. (9) 


From the first of these, by taking s = r, we can deduce that ¢,, is real. The second 
now states that ¢,~, is positive. To examine the legitimacy of these assumptions, 
let us consider the effect of a transformation of the type (4) & (5). Equation (8) 
gives 
Fras Ws = fAsar Py 
and the inequality (9) gives 
fa-as¢,v, > 0. 


From these relations we obtain 
OrWs = PUG, oY, > 0 


provided f is real and positive. Thus a restriction must be imposed on 
the transformations (4) & (5) in order that (8) and (9) may remain invariant. 

In future we shall keep to the view that each ¢ is equal to, and not merely 
proportional to, the conjugate imaginary of the corresponding w, as the more 
general view, which is theoretically permissible, does not lead to anything of 
interest. This means that our equations need be invariant under transformations 
of the type (4) only provided b, = G;, i.e. provided in (5) f = 1. The restriction 
on the transformations of the type (4) which is necessary for (8) and (9) to be 
invariant is included in this one. 

We shall often assume that a w, and the conjugate imaginary ¢, satisfy 


Pry = 1, 


when they will be called normalized to unity, or simply normalized. The inequality 
(9) shows that it is always possible to normalize a w or a @ by multiplying it by a 
number. The modulus of this number is determined but not its argument. 
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A corollary of (9) is that if, for all w 


rh = 0, 
then ¢, = i ay) 


This follows from the fact that if ¢, is not identically zero, its conjugate 
imaginary wW, will be a w that does not satisfy ¢-~ = 0. There is, of course, 
also the corresponding theorem with @’s and w’s interchanged. 

The theorem will now be proved that if ¢, and w, are normalized, then 


lors] <1, (11) 


the case of equality occurring only when @, and w, denote the same state. Let a 
be any real number and apply the inequality (9) to the state denoted by a, — ew 
or ¢, — e-"“¢s. This gives 


(o, 7 eb) (Wr —e'W,) > 0 
e brPr — C brihy — Csr + Oss > 0. 


Hence, using the normalizing conditions ¢,w, = @,W, = 1, we obtain 


ebybs +e bsp < 2. 


The second term on the left-hand side is just the conjugate complex of the first. 
Hence the real part of e’*,, is less than unity. Since this must hold for all values 
of a we must have the modulus of ¢,w, less than unity. This gives the required 
result (11), when we take into account the fact that the inequality becomes 
an equality if w,— ew, =0 for some value of a, which means that w, and w, 
denote the same state. 

Our introduction of products of @’s with w’s has so far been entirely 
a mathematical question, with no physical implications. A physical meaning will 
now be given to the product ¢,~w,. Consider that maximum observation of the state 
@, for which there is a certainty of a particular result being obtained. We have 
seen that such a maximum observation always exists. Suppose now this maximum 
observation to be made on the system in the state w,. There will be a certain 
probability of the same result being obtained, which we call the probability of 
agreement of W, with @,. It is a number that depends only on the two states wv, 
and ¢,. In particular it is unity if w, is the same state as ¢,. We now assume that 
the probability of agreement of w, with ¢, is equal to |,,|? when ¢, and 7, are 
normalized. It has just been proved that this value for the probability can never 
exceed unity, so that the assumption is reasonable. Again, the only transformation 
of the type (4) that one can make on a normalized ¢ or w without destroying 
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its normalization is multiplication by a number of modulus unity. This will not 
change the value of |¢,~,| which thus has the necessary invariance for its physical 
meaning to be permissible. 

When we give this physical meaning to the product of a @ and a w the axioms 
and assumptions (6), (7), (8) & (9) become, to a certain extent, physical laws, as 
physical consequences can now be deduced from them. For instance, from (8) one 
can deduce that the probability of agreement of w, with w, equals that of w, with 
ws. Again, from (6) and (7) one can calculate how the probability of agreement of 
a State Wo with a state cyw1 + coe formed by the superposition of w, and we varies 
with the coefficients c; and cp. Let us take the case when and wy are orthogonal, 
i.e. when there exists an observation which is certain to lead to different results for 
the two states, so that their probability of agreement is zero. This requires that 


oiv2 = 0, poy = 0. 


In order that cyw, + coW2 may be normalized as well as w, and w2 we must have 


1 = (Goi + Geo) (E1Y1 + C22) 
= Jer)” diva + |c2|? dora 
7 Jer? 5 lea)’. 


If we now take wo orthogonal to w2, we find for the probability of agreement of wo 
with cy, + coy the value 


|bo(erys + cova) |” = |doerda|? = Jer!” low? 


which is |c,|? times the probability of agreement of wo with ~. This result as it 
stands is not a physical one, since we have no other physical meaning for |c; i which 
we can equate to the ratio of the probability of agreement of Wo with cw, + coe 
to that of Wo with y,. The fact that this ratio is independent of the state wo 
provided it is orthogonal to w,2 is, however, a physical result and is an example of 
the physical conclusions contained in the axioms (6) and (7). 

We see further that these axioms give physical meanings to the coefficients 
occurring in a superposition process, or at least to the squares of their moduli. 
The simplest such physical meanings are obtained when we put wo equal to 71 
or WJ in the above example. This gives the result that |c,|? is the probability of 
agreement of c,wW1 + coe with ~ and |c2|? is that of c,qy + coe with wo. The sum 
of these two probabilities of agreement is unity, as could have been inferred from 
the definition of superposition of §6. We may call |c,|? and |c)|* the weights with 
which y, and W2 occur in the superposition process. The state cq, + coy is not 
completely determined by these weights, as a phase factor, namely, the argument 
of c,/c2 is also necessary. This phase has no such simple physical meaning as 
the weights. 
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9. Algebra of Observables 


We must now introduce dynamical variables into the analysis. In classical 
mechanics a dynamical variable, for any state of the system, is given by a particular 
function of the time and is thus something that refers to all times. In the quantum 
theory a dynamical variable is no longer given by an ordinary function of the time, 
although it must still be something that refers to all times if it is to be the analogue 
of a classical dynamical variable. In quantum mechanics it is more convenient 
to deal with something that refers to one particular time instead of to all times, 
analogous to the value of a classical variable at a particular instant of time. 
We shall call such a quantity an observable. We can now say, in both classical 
and quantum mechanics, that any observation consists in measuring an observable 
and the result of such an observation is anumber. The measurement of a dynamical 
variable for a particular state would in the classical theory give as result a function 
of the time and would in the quantum theory in general have no meaning. 

We now denote each observable by a symbol. Thus the value of a Cartesian 
co-ordinate of an electron at a particular time t; would be an observable and could 
be denoted by the symbol z(t,). A dynamical variable, such as x(t), may be 
regarded as an observable that depends on a parameter ¢ which denotes the time. 
The symbols that denote observables will be used in the analysis along with 
the symbols that denote states, in accordance with certain rules and axioms that 
will now be given. 

Any symbol a denoting an observable can be multiplied into any symbol w 
denoting a state, giving a product, which must be written aw with the w factor 
on the right-hand side. This product is of the nature of a w and thus denotes 
a state and can be added to other w’s. In the vector picture of the w~’s we should 
say that an observable a@ is an operator which can be applied to any vector w 
to give another vector aw. We assume the distributive axiom of multiplication, 
i.e. 

ald + p2) = ai + ary (12) 
and we also assume 
(cw) = c(h) (13) 
where c is any number. In the vector picture this means that the operator a 
is a linear operator and thus consists of rotations and uniform extensions or 
compressions applied to the vector field. The multiplication of the w’s by a number 
is an operation on them which satisfies these conditions, so that an ordinary 
number may be regarded as a special case of an observable. Its physical meaning 
will be discussed later (see §11). 

If an observable a is such that aw = 0 for all ~, then we assume that a = 0. 

This means that an observable is completely determined when its product with 
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an arbitrary w is given, since if we have two observables whose product with 
an arbitrary w is the same, their difference must vanish. We now define the sum 
Q, + Q2 of two observables a, and a2 by the condition 


(a1 + a2)) = ayy + ay (14) 


for all w. The commutative and associative laws for the addition of observables 
follow at once from this definition and from the corresponding laws for the addition 
of w-symbols. We further define the product a,a2 of two observables a, and ap 
by the condition 

(a1a2)b = ay (ag) (15) 


for all». The associative and distributive laws for the multiplication of observables 
follow at once from the definition, e.g. for the associative law we have 


[(a1a2)as)v = (a1a2)(a3v) = ai[a2(a3?))| 
ay [(a2a3)b] = [a1 (a2a3) |v 


and since this holds for all ~ we must have 
(a1a2)a3 = a1 (A203), 


However, the commutative law for the multiplication of observables in general does 
not hold, i.e. in general a;a2 is not equal to a2a;. In the special case when a ;a2 
is equal to a2a,1, we say that a, commutes with ag or that a, and a2 commute. 
We say that three or more observables commute when each commutes with all the 
others. 

Since the theory is to be symmetrical between the w’s and the @’s it must 
be possible to multiply any observable a@ into any ¢-symbol. The product, which 
we always write as ga with the ¢ on the left-hand side, must be of the nature 
of a @ and thus be capable of denoting a state and of being added to other @’s. 
Corresponding to (12) and (13) we must have 


(1 + d2)a = b1a + doa 
and (cd)a = c(ga). 


We require one more axiom in our symbolic algebra, namely, an associative axiom 
of multiplication which says that 


(ga)p = o(ay), 


so that either of these numbers may be written as gay without brackets. 
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This final axiom enables us to prove that the sum or product of two observables, 
defined by (14) or (15), is the same as the sum or product defined in the analogous 
way with @’s instead of w’s, 7.e. by 


b(a1 + a2) = bar + Gay (16) 
or 0(a1a2) = (¢a1)a2 


for all ¢. In the case of the sum, for instance, if we take the definition (14) we can 
infer from it, with the help of (6), that 


(a1 + A2)W = Gay + bay 
or [o(a1 + a2) — bay — Gay] = 0 


for all ¢ and ~. Hence from (10) we must have 
b(a1 + a2) — gay — Gay = 0, 


which is the required result (16). The case of the product is quite similar. A further 
similar argument enables one to deduce, from the assumption that if aw = 0 for 
all | then a = 0, the result that if da = 0 for all ¢ then a = 0. 


10. Conjugate Complex Observables 


It is convenient to count sums and products of any observables as other observables. 
This involves, as we shall see shortly, an extension of the meaning of an observable 
to include the analogues of complex functions of classical dynamical variables, or 
rather the values of such complex functions at specified times. An observable 
is thus not necessarily a quantity capable of direct measurement by a single 
observation, but is a theoretical generalization of such a quantity. 

More generally it is convenient to count any operator that can be multiplied 
into the w’s and @’s in accordance with the foregoing axioms as an observable. 
Thus one can define an observable a by specifying the values of aw for all w, and 
these values may be chosen arbitrarily except for the condition (12). If one takes 
a complete set of independent w’s, w, say, a complete set being one such that 
any w can be expressed linearly in terms of its members, then the values of ay), 
for the members of this set w, may be chosen quite arbitrarily, and the value of 
aw when ~ is not a member of the set is then determined by (12), so that a is 
determined. Again, instead of specifying the aw’s, one could define a by specifying 
the numbers ¢,aw,, which are quite arbitrary when the @,'s as well as the w,’s form 
a complete independent set. The fact that a is uniquely determined in this way 
follows from (10). 
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Now let a be any observable and consider the equation 


PsYr = OrBYs (17) 


where w, and w, are any two wW’s and @¢, and @, are their conjugate imaginaries. 
We can consider this equation as defining a new observable 6, since we can 
assume (17) holds for a complete set of independent ~,’s and for a complete set of 
independent ws, and since, as is easily verified, if (17) holds for two values of w, 
it must hold also for any linear combination of them, and similarly for w,. In fact 
if (17) holds for w, = q, and for w, = W2, we have the equations 


Pay, = b1 8, sz = 2B, 


from which we can deduce 


bsa(er yy “tr CoW2) = C1 PY + C2 bag 
oe. C1P1 bs + C228 Ws 
= (Crb1 ot €2$2) Bs, 


which shows that (17) holds also for w, = civ + coo. 
The observable 6 defined by (17) is called the conjugute complex of 
the observable a and is written a. Thus 


PsaYpr = bras. (18) 


The conjugate complex of @ is a. We use the words ‘conjugate complex’ and not 
‘conjugate imaginary’ since it is permissible to add together an observable and its 
conjugate complex, both being quantities of the same nature, so that one can split 
up any observable a into its real part, 4(@ + @), and imaginary part, 4(a — @). 
The condition for an observable a to be real is 


Psy —= bras. (19) 


In the special case when the observable a is a number, its conjugate complex 
defined by (18) is the ordinary conjugate complex number. 

It will now be proved that if ~, and ¢, are conjugate imaginary symbols, then 
80 also are aw, and ¢,a@ for any observable a. If we denote by ¢ the conjugate 
imaginary of aw, then from (8) 








ops = say 
for arbitrary ~,. But from the definition (18) 


Psa = G1 As. 
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Hence os = 10s 
for arbitrary ~,, so that from (10) (with @’s and w’s interchanged) 
@ = oid, (20) 


which was to be proved. 
We shall now find the conjugate complex of the product a a2 of two observables 
a, and ag. The equation that defines this conjugate complex, a az is 


PpA1A2Wq = PqA1A2p (21) 
for arbitrary 7, and ¢,. If in formula (8) we put 
Qs = OqQ, Wr =, A2Wy, 
which require, from the theorem of equation (20), 


Ws — aa: Oy = QpQ2, 


we get Pp AZ a1 Wq a PqQ1 Ap. 








Comparing this with (21) we obtain, since these equations hold for arbitrary ¢, 
and w,, the result 

Ql A = Ay A, (22) 
Thus to find the conjugate complex of a product we must take the conjugate complex 
of each factor and reverse their order. This rule holds also when there are more 
than two factors in the product, as may be proved by successive applications of 
the rule for two factors, e.g. 





QA1A2Q03 = a3 Aaya, = A3 a2 QA]. 


As a corollary of this theorem we have that if a, and a2 are two real observables, 
then aya2 + a2Q, is also real and a,a2 — a2Q, is imaginary. Only when a, and 
a2 commute is a1Q2 also real. Equation (18) and the theorem of equation (20) 
show that it is a general rule that when one forms the conjugate imaginary or 
the conjugate complex of any permissible combination of the symbols denoting 
observables and states, one must reverse the order of the factors in a product and 
take the conjugate imaginary or conjugate complex of each factor. 


11. Physical Interpretation of Algebra of 
Observables 


The axioms and assumptions that we have made about observables are so far purely 
mathematical and have no physical implications. The physical connexions, which 
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cause these axioms and assumptions to become physical laws, will now be given. 
The observables that appear in the discussion in this section must be understood 
to be all real observables. 

If a state w, and an observable a are such that, when an observation is made of 
the observable with the system in this state the result is certain to be the number 
a, we assume this information can be expressed by the equation 


aw, aan a, (23) 


Conversely, when an equation of this type is given we assume it has the physical 
meaning that a measurement of the observable a with the system in state w,. will 
certainly give for result the number a or that the observable a has the value a for 
the state w,, to use a classical way of speaking which is permissible in this case. 
Equation (23) is equivalent to 

by = aby (24) 
provided a is real, since, from the theorem of equation (20), equation (24) is just 
the conjugate imaginary of equation (23). Thus the symmetry between the ¢’s 
and w’s is maintained. 

In the special case when the observable a is a number, then equation (23) holds 
for every state w, with this same number for a. This means that the observable 
is of a trivial kind such that any measurement of it always gives one particular 
result, independent of the state of the system. 

We can now deduce some physical results from the theory. For example, if for 
a given state w the observable a, has the value a; and the observable a2 has 
the value az, we have the equations 


ayy = ay, AgY = ay, (25) 


from which we can deduce that 


(a1 + Q2)y = (a1 + a2), 


aya) = ayazy, 


and thus infer that for the state ~ the observable a, + a» has the value a, + a2 and 
the observable a,@2 has the value aja. These results are necessary for the theory 
to be consistent, since the observations of a, and a, for the system in state w are 
compatible, as neither observation need cause a change in the state, so that one 
would expect the ordinary classical ideas of measurement to be valid. For the same 
reason we require the result, which may easily be deduced from the first of 
equations (25) by induction, that f(a,) has the value f(a) for the state w, where 
f denotes any function expressible as a power series. We shall later define more 
general functions of an observable than are expressible as power series, and for 
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these more general functions this result will still hold. In fact it will form the basis 
of the definition of these more general functions. 

Again, if we are given that an observable a has the value a for each of two 
states w, and wo, we can write down the equations 


ayy, = ayn, aps = ary 


from which we can deduce that 


a(cry1 + coz) = a(civ1 + cov) 


Thus a has the value a also for any state obtainable by superposition of w, and 
we. This result was deduced in §6 from the definition of superposition and the fact 
that it is also deducible from the present analysis illustrates the self-consistency 
of the theory. 

In classical mechanics an observable always has a particular value for any state. 
This is not so in quantum mechanics, where a special condition of the type (23) is 
necessary for an observable to have a particular value for a certain state. In general 
the measurement of an observable for a given state will lead to one or other of 
a number of possible results, according to a certain probability law. The question 
now to be considered is what can be said in the general case about an observable 
with respect to a state. If one has an observable a and one takes any two states 
oy, Ws, one can form the number ¢,aw,. This is the only general way of forming 
numbers referring to an observable and particular states. Thus an observable has 
a numerical value associated with each pair of states, in stark contrast* to the 
classical theory, where an observable always has a numerical value associated with 
a single state, namely, the value of the observable for that state. 

We could, however, as a special case, take conjugate imaginary symbols ¢, and 
w, which both denote the same state, and form the number ¢,aw,. We should 
then have a number completely determined by the observable a and the state w,, 
provided the ¢, and w, are normalized, since, as is easily verified, ¢,aw, remains 
invariant under any transformation of the type (4) with b, = G, that preserves 
the normalization. Thus it is possible to associate with the observable a a definite 
numerical value for a single state w,, but it would not be convenient to define this 
number as the value of the observable a for the state w,, for the following reason. 
If for a particular state a, is the value of an observable a; and ay is that of ag, 
then we should require a, + a2 to be the value of a; + a2 and aj,az to be that of 
QQ. The definition just proposed for the value of an observable for a state would 
give 


a1 = br QA1Yr, a2 = bA2Wr, 


*Original:- sharp distinction 
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from which we could deduce 


a + ay = by (A1 + 2), 


and hence infer that a; +a is the value of a; +a. We could not, however, deduce 
that 


a1a2 = 01 A2Wr, 


which would, in fact, in general be untrue, so that we could not infer that a,a2 is 
the value of a,a2. Thus we cannot take ¢,aw, as a general definition of the value 
of an observable a for a state ~,. We must rely on the equation (23) to give 
the definition of this value in the special cases when it exists. 

The fact, however, that the proof fails only in the case of the product a,;a2 and 
not in the case of the sum a, + a2 allows us to say that ¢,aw, is the average value 
of the observable a for the state w,. This is so because the average of the sum 
of two quantities must equal the sum of their averages, but the average of their 
product need not equal the product of their averages. Thus our symbolic algebra 
allows us to define a certain number as being the average value of an observable for 
a particular state, without leading us to inconsistencies. The assumption that this 
so-defined average is really what one would obtain if one measured the observable 
a large number of times (the system having to be re-prepared each time, of course, 
in order that it may be in the proper state) and worked out the average result, 
constitutes the main link connecting the symbolic algebra with physical facts. 
The other links previously given, 7.e. the assumption that lbrwy| is the probability 
of agreement of ¢, with w, and the assumption that the equation aw = aw holds 
when an observation of a on the system in state w will certainly lead to the result 
a, will be shown later (§18) to be deducible from this main link as special cases. 

If an observable a has the value a for a state w,, so that equation (23) holds, 
we can deduce that 


dravr = ora, = AbrWr =a 


if ¢, and Ww, are normalized. Hence the average value of a for the state w, is 
found to be a, as is necessary for the physical interpretation of the theory to be 
consistent. We cannot, of course, deduce the converse, 7.e. deduce (23) from the 
equation ¢,aw, = a. 

The numbers ¢,aw, which the theory also gives us, where ¢, and w, denote 
two different states, do not have any such direct physical interpretation as the 
numbers ¢,aw,. We shall find later that |¢,a~,| is, apart from a certain factor, the 
probability of a transition from state w, to state @, being caused by a perturbing 
energy whose time integral is a. (See 852.) 
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12. Example of Algebra of Observables 


As an example of the symbolic algebra of observables, which is the same as ordinary 
algebra except for the non-validity of the commutative law of multiplication, 
we shall consider some propertics of two observables, p and q, that satisfy 


qp—pqy=t (26) 


i being a root of minus one. From 810 we see that it is possible for two real 
observables p and q to satisfy this relation. If we multiply (26) by q on the left, 
and then by q on the right, we obtain 


gp — qq = ig 
and gpg — pq? = 14, 
from which, by addition, we find 

gp — py’ = 2ig. 


This result can be generalized. If we multiply (26) firstly by g’~! on the left, 
secondly by g”~? on the left and q on the right, thirdly by g’~? on the left and 
gq? on the right, and so on until n-thly we multiply simply by q”~! on the right, 
we get the equations 


q"p — q” ‘pq = ig’ 
q” ‘pq = gq” pq? _ ign! 
q” pq" is q” nq? - ign! 


gpq’* — pq” = igh, 
which give, on addition, the result 
q’p — pq” = nigh. 
This result may be written 
q"p — pq” = idq"/dq. 
It follows that, if f(q) is any function of q expressible as a power series, 


fo—pf =idf/dq, (27) 


12. Example of Algebra of Observables oil 


since this result must hold separately for each term in the expansion. 
As a special case, we may take for f the power series 


f@=>> ae 


n! 


n 





n=0 


where c is a number. We can define this to be e’? and the ordinary exponential 
theorem will then hold, since no symbol that does not commute with g could occur 
in the proof of it to make a difference between the present and ordinary algebra. 
With this expression for f, (27) becomes 


icq teq __ 


e'4In — pe —ce'4 
icq 


or ep = (p—c)e'™. (28) 


Il. EIGENVALUES AND 
EIGENSTATES 


13. Definitions and Elementary Properties 


IN the present chapter we shall consider some of the properties of real observables. 
If we have any real observable a we can write down the equation 


ap = ay (1) 


where a is a number, and consider it as an equation for the two unknowns a and w. 
If a and w are any solution, we call them respectively an eigenvalue and an eigen-w 
of the observable a. It may easily be seen that the eigenvalues are all real numbers, 
since if we multiply (1) by the ¢-symbol that is conjugate imaginary to ~ we obtain 


pap = apy. 


Now ¢aw and dw are both real, as follows from equations (19) and (8) of 
the preceding chapter when one takes r = s, and hence a must be real. Analogous 
to (1) is the equation 

ga = ag. (2) 
If a and w are any solution of (1), then the same value of a and the ¢ that 
is conjugate imaginary to this ~ form a solution of (2), since equation (2) is 
then the conjugate imaginary of equation (1). We call the @’s that solve (2) 
eigen-@’s, and the states denoted by the eigen-w’s or eigen-@’s we call ezgenstates 
of the observable a. Each eigen-w, eigen-¢ or eigenstate is associated with one 
definite eigenvalue, or, as we shall say, belongs to that eigenvalue. 

The physical meaning of an eigenvalue is that there exists a state, namely, 
the eigenstate belonging to it, such that a measurement of the observable when 
the system is in this state will certainly give for result just this eigenvalue. 
The eigenvalues of an observable are the possible results of a measurement of this 
observable. Every possible result of the measurement of a must be an eigenvalue as 
it must satisfy (1) when one takes for the w in this equation the state of the system 
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immediately after the observation has been made. The whole set of eigenvalues 
of an observable may consist of a discrete set of numbers, or a continuous range 
of numbers, or perhaps both. The calculation of eigenvalues is one of the main 
problems of quantum mechanics. 

In the special case when the observable is a number, it has only one eigenvalue, 
namely, itself, and any state is an eigenstate. If @ is any observable and c is a 
number, then, as follows at once from the definitions, each eigenvalue of a + c is 
greater by c than an eigenvalue of a and each eigenstate of a+c is an eigenstate of 
a. Similarly each eigenvalue of ca is c times an eigenvalue of a and each eigenstate 
of ca is an eigenstate of a. 

The theorem will now be proved that two eigenstates belonging to two different 
eigenvalues of an observable are orthogonal. Suppose the eigenstate w, belongs to 
the eigenvalue a, and the eigenstate w2 belongs to the eigenvalue a2. We then have 
the equations 


ay, = a1 (3) 
2 = Are (4) 
Multiplying (3) by @2 on the left-hand side and (4) by 7; on the right-hand side, 
we obtain prone ately 
and p2aw = azz. 
Hence (a1 — a2)d2y = 0, 


so that, if a; is not equal to ag, then ¢oy, = 0 and the two states wy, and we 
are orthogonal. This theorem is required by the physical meaning of eigenstates, 
since for two eigenstates belonging to two different eigenvalues there exists 
an observation, namely, the measurement of the observable a, for which the result 
must certainly be different in the two cases, so that the two states are, by definition, 
orthogonal. 

If uw, and w. are two eigen-w’s belonging to the same eigenvalue, then it is 
evident that any linear combination of them (cw + coW2) must also be an eigen-w 
belonging to this eigenvalue. It will now be proved that no linear combination of 
eigen-w’s belonging to different eigenvalues can be an eigen-w, 7.e. that eigen-w’s 
belonging to different evgenvalues are all necessarily independent. If this were not 
so we should have a relation of the type 


So crt = 0, (5) 


with numerical coefficients c,, between a number of eigen-w’s belonging to different 
eigenvalues. We can without loss of generality assume that there is no other 
independent relation of this type between these eigen-w’s, since if there were others 
we could eliminate some of the w’s, which would leave a single relation of this type 
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between the remainder. Multiplying (5) by a, we find 
iy cea Hades (6) 


if a, is the eigenvalue belonging to w,. Now (6) is a linear relation between the w,’s 
with numerical coefficients and therefore, by hypothesis, must not be independent 
of (5). This requires that the a,’s shall all be equal, so that the ~,’s must all belong 
to the same eigenvalue. 

This theorem could have been inferred, from the definition of superposition 
in §6 together with the physical meaning of eigenstates. A relation of the type 
(5) implies that one of the eigenstates, q, say, is obtainable by superposition of 
the others wo, w3,..., so that any result that can be obtained from an observation 
of the system in state ~, must have a finite probability of being the result when 
the observation is made on the system in at least one of the states wo, W3,.... 
This would not be the case if the observation consisted in the measurement 
of the observable a when the y,’s all belong to different eigenvalues of a. 
Thus a relation of the type (5) is impossible. 


14. The Expansion Theorem 


The expansion theorem of the theory of eigenvalues asserts that an arbitrary 
w-symbol can be expanded in terms of eigen-w’s of any real observable, thus 


b= dou (7) 


where the w,'s are eigen-w’s of a real observable a. Such an expansion must be 
unique, since otherwise there would be a relation of the type (5) between eigen-w’s 
belonging to different eigenvalues. If the eigenvalues of a do not form a discrete 
set of numbers but a continuous range, or if they form both a continuous range 
and a discrete set, then the number of eigen-w’s occurring in (7) may be more 
than an enumerable number and equal to the number of points on a line. In such 
a case we may require an integral of the type 


v= f vpdp (8) 


in order to express the general w, or we may require both a sum and an integral. 
The theory of w-symbols developed in the previous chapter does not give any 
rigorous definition for an integral of the type (8). In order to get such a definition 
one would have to introduce a number of new assumptions concerning limits and 
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continuity for the ~-symbols, which would be beyond the scope of the present work. 
For all physical purposes it is sufficient for one not to aim at a rigorous theory when 
dealing with such things, but to content oneself with making use of rough intuitive 
notions about limits and continuity, such as could be obtained, for instance, 
from the vector picture of the w’s. These intuitive notions show that if one has 
a y~-symbol w, that involves a parameter p in some reasonably continuous way, 
one can differentiate or integrate ~, with respect to p and the result will be 
another w-symbol. 

Under these circumstances one cannot, of course, attempt to give a rigorous 
deduction of the expansion theorem from the symbolic algebra. The following 
argument, however, makes the theorem appear plausible. Consider the w-symbol 
w, that is a function of the parameter 7 and that satisfies the differential equation 


O 
an? = 1ay,. (9) 


If w, is given for one value of 7, then this equation fixes w, for a slightly greater 
value of 7. Thus we should expect this equation to have one solution, and only 
one, for any given initial value for w,, i.e. for w, equal to an arbitrary wo when 
T = 0. Suppose now that this solution can be expressed as a Fourier series or 
integral in 7, thus, if we take for definiteness the case of the integral, 


ee i cP'y dp, (10) 


where 7, is independent of 7, but involves the new parameter p. Substituting 
this expression for w, in (9), we obtain 


[ivery dp = ia f ere dp 
or [vere dp = [evens dp. 


Since this equation holds for all values of tT we can equate coeflicients of e’?”, 
which gives 

PW = OAD. 
Thus y, is an eigen-~ of a belonging to the eigenvalue p. If we now put 7 = 0 
in (10), we obtain 


on = f vpap, 


which expresses the arbitrary ~o in terms of the eigen-7’s ~, in the form (8). 
The discrete terms such as occur in (7) would arise when the Fourier expansion 
(10) requires terms of a Fourier series. 
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The weak point in the above argument is the assumption of the possibility of 
a Fourier expansion (10) for 7#,. If one takes the vector picture and considers ~, 
to be a vector varying continuously with 7, one would expect some kind of Fourier 
expansion to be possible, except when the magnitude of the vector tends to infinity 
as T — oo, a possibility that may very well occur with an equation of motion of 
the type (9). One can, however, exclude this possibility by making use of the fact 
that a is a real observable. (For an observable that is not real the expansion 
theorem is not necessarily true.) If ¢, is the ¢-symbol that is conjugate imaginary 
to w,, it will satisfy the conjugate imaginary differential equation to (9), which is 


br 7 eo 
0 Op, | Ob, 
H Gadd = 


= $,iar), — id, at, = 0. (11) 


Thus the square of the modulus of the vector w,, which is ¢,w,, remains constant. 

From the above non-rigorous discussion one would expect the expansion 
theorem to follow rigorously from the symbolic algebra with the addition of suitable 
axioms about limits and continuity. The corresponding theorem for ¢’s must then, 
of course, also hold. Throughout the rest of this chapter we shall, for definiteness, 
assume the expansions we have to deal with involve sums and not integrals. 
The theorems to be proved would still be true for integrals, only formal alterations 
in the proofs being required. These formal alterations would, however, require 
a new notation, and this will be given in the next chapter (see §22). 








15. Functions of an Observable 


The expansion theorem enables one to give a definition of a function of a real 
observable of the same degree of generality as that of an ordinary function of 
a real variable. Let a be a real observable and let w, be one of its eigen-7’s, 
belonging to the eigenvalue a,, so that 


Wy = App. 


It is evident, as was mentioned in 811, that if f(x) denotes any function of x 
expressible as a power serics, then 


f(a)vp = f (ap) vp- (12) 


We can assume that this relation holds for more general functions. If f(x) denotes 
any function of the real variable x whose domain includes the point x = ay, 
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then the right-hand side of (12) has a meaning and we can define f(a)w, by this 
right-hand side. If there are several eigen-~’s belonging to the same eigenvalue ap, 
say W,, Up,---, 80 that there can exist linear relations between them of the type 


> ev, = 0, 


where the coefficients c’ are numbers, then the definition (12) is self-consistent, 
since it gives 


fla) So ev = Soe f(ayd, = doe Flap), = 0. 


Thus if the domain of the function f(z) includes all the eigenvalues of a, we can give 
a meaning to f(a) multiplied into any eigen-w of a. Further, we can give a meaning 
to f(a) multiplied into an arbitrary w, since we can expand this arbitrary w in 
terms of eigen-w’s and multiply f(q@) into each term of the expansion separately. 

Thus one can give a meaning to f(a) when f(x) is any function of the real 
variable a, even an irregular or discontinuous one, whose domain includes all 
the eigenvalues of a. If this domain contains other points besides the eigenvalues of 
a, then the values of f(x) for these other points will not affect f(a). These results 
are a necessary consequence of the physical meaning of eigenvalues. If a is 
an observable quantity, then f(a) must also be observable when f(x) is any 
function of the real variable x that has a meaning for all values of x that are 
possible results of the observation of a, i.e. all eigenvalues of a, since the same 
apparatus and experiment that measure a really also measure f(a). 

It follows from (12) that every eigen-w of a is an eigen-w of f(a). The converse, 
that every eigen-~) of f(a) is an eigen-w of a, is not true, except when a is 
a function (a single-valued function is of course understood) of f(a). Also it follows 
that the eigenvalues of f(a) are just this function f of the eigenvalues of a, 
e.g. the eigenvalues of a”, are the squares of those of a. These results are obviously 
necessary for the physical meanings of eigenvalues and eigenstates to be tenable. 
Again, it may easily be deduced from the definition (12) that the sum or product of 
two functions of an observable is a function of that observable and that a function 
of a function of an observable is a function of that observable, which results are 
also physically necessary. 

We can use the eigen-¢’s instead of the eigen-w’s in order to define f(a). 
We then have 


bp f(a) = Ff (Gp) op, 
where ¢, is any eigen-@ of a. This equation is, according to §10, just the conjugate 
imaginary equation to (12) and is thus deducible from (12). The two definitions 
of f(a) are therefore equivalent. 
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The theorem will now be proved that any observable that commutes with a 
commutes also with f(a). This theorem is of course obvious when f is expressible 
as a power series. Let 3 be any observable that commutes with a, i.e. that satisfies 
Ba =a. Let Wy be an eigen-w of a belonging to the eigenvalue a, and let ¢, be 
an eigen-@ belonging to the eigenvalue a,, which may or may not equal a,, so that 


Ay = App Pg = Aghg: 


We now have gba = bgBApp = ApbghYp. 
Again Pqbaty = byaABYp = AghqhYp. 
Hence (dp am Aq) Pgh Vp = 0, 


so that either ¢,6y~, = 0, or a, = ag, which would give f(a,) = f(a,). Thus in 
either case 


[f(ap) — Flag) Pabvp = 0. 


Now bg Bf (a)vp = bg f(Gp)Up = F (Gp) bgBUp 
and again Daf (a) Bp = f(aq)bqgh rp. 
Hence bal Pf (a) — f(a) B]dp = [fF (ap) — Faq) ab bp = 0. 


This result is true for any eigen-y, w,, and is hence also true for an arbitrary y, 
which can be expanded in terms of eigen-w’s. Similarly it is true for any eigen-d, 
¢q, and is hence also true for an arbitrary ¢, which can be expanded in terms of 
eigen-@’s. Hence 
Bila) fla)e =9, 

which is the result required. In this proof it is not assumed that 6 is a real 
observable, although, of course, it is understood that a is real in order that a 
general function of a may have a meaning. 

The converse theorem will now be proved, namely, if every observahle that 
commutes with a real observable a also commutes with another observable f, then f 
1s a function of a. It will first be shown that if 7, is any eigen-~ of a, then it is also 
an eigen-w of f. We introduce an observable ( satisfying the following conditions: 


BWq = 0, 
whenever w, is an eigen-~ of a belonging to an eigenvalue a, that differs from that 
of w,, which is a,; 

Bp = Wp 
and Bw, = 0, 


whenever yy, is one of a set of eigen-w’s of a belonging to the eigenvalue ap, such 
that this set, together with ~,, form a complete independent set of all eigen-w’s 


39 


belonging to the eigenvalue a,. We shall then have that y,, the 7%,’s, and q,’s 
form a complete set of independent ~’s, so that 6 is completely defined by these 
equations. It is now easily verified that 


apy, =0= Bag 
AB Yy = App = Bary 
aby, =0= Bay’. 
Thus aby = Baw 
for arbitrary ~ and @ commutes with a. Hence, by hypothesis, 6 also commutes 


with f, so that 
Bip = Ep = Fp. 


Now for an arbitrary ~-symbol w one must have 


py "2 CWp, 


where c is a number, as one can easily see by expanding w in terms of wp, the y%,’s 
and w,’s, and multiplying § into each term separately. 


Hence Bf Vp = CW, 
so that fp = cp 


and jy, is an eigen-w of f. To complete the proof that f a function of a according 
to the above definition, it remains to be shown only that if two or more eigen-w’s 
belong to the same eigenvalue of a, then they also belong to the same eigenvalue 
of f. The functional relation between the eigenvalues of f and those of a will then 
specify the function that f is of a. Now if two or more eigen-w’s of a belong to 
the same eigenvalue of a, then any linear combination of them will be an eigen-w 
of a. From what has already been proved it follows that this linear combination 
must also be an eigen-w of f, which can be the case only if the eigen-w’s, that it 
is a linear combination of, all belong to the same eigenvalue of f. 


16. Examples of Functions of Observables 


Some examples of elementary functions of a real observable a will now be 
considered. The reciprocal a~! always exists when a has not the eigenvalue zero. 
By definition it satisfies 


—1 —1 
a Wp = Ap Wo, 


where 7, is an eigen-~ of a belonging to the eigenvalue a,. Hence 


7 =i 
Aa Wp = AAp “Yy = Vp 
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and since this is true for all w, we must have aa~! = 1. Similarly a~'a = 1. Either 
of these equations is sufficient to determine a~! completely when this reciprocal 
exists according to the above definition. To prove this result, suppose there are 
two solutions, (a~'),, and (a~!)s, of aa~'; so that 


a(a'),; =1 alam"), = 1. 
This gives ae =.0; (13) 
where € = (a7), — (a*)o. 


If a is such that there exists a €, not identically zero, satisfying (13), then a can 
have no reciprocal, according to the above definition, since if such a reciprocal a! 
exists we obtain, by multiplying (13) on the left-hand side by a“, 


0=a lak = €. 
Hence € = 0 and our two solutions of aa~! = 1 are identical. 
As a second example we shall take the square root of a. This is defined by 


VO hy = Ey/Ay Yp. (14) 


The square root of a always exists, but is a real observable only provided a@ has 
no negative eigenvalues. From (14) one obtains 


VaV/ avy = VapV dp Vp = ay 
so that Ja/a=a. (15) 


On account of the ambiguity of sign in (14), the square root of an observable is 
to a certain extent indeterminate. In order to determine a square root completely 
one must choose a particular sign for each eigenvalue a, to insert in (14), which 
is the same as fixing the sign of the square root of a real variable whose domain 
consists of the eigenvalues a,. One can choose the sign to vary as irregularly as 
one likes in passing from one eigenvalue to the next, and equation (14) will always 
define an observable ,/a satisfying (15) that can legitimately be called a square 
root of a. If the observable a has two eigen-w’s belonging to one and the same 
eigenvalue a,, then we could define an observable \/a by equation (14) with the + 
sign for one of these eigen-w’s and the—sign for the other, and with arbitrary signs 
for the eigen-7’s belonging to eigenvalues other than a,. This observable would 
still satisfy (15), but it would not be a function of the observable a in accordance 
with our definition, which requires a unique coefficient on the right-hand side 
of (14) for each eigenvalue ay, so that this coefficient will form a single-valued 
function of the real variable a,. The \/a defined without this unique coefficient 
would not, for instance, satisfy the condition of commuting with any observable 
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that commutes with a. Thus, unlike what we had in the case of the reciprocal, 
equation (15) is not sufficient for the definition of square-root functions, but must 
be supplemented by the condition that the observable that is being defined is 
actually a function of a. The number of different square-root functions is 2” where 
n is the number of different eigenvalues of a. The most useful one is usually that, 
which exists only when all the eigenvalues of a are positive, for which the positive 
sign is taken in every case. 

As an example of a non-analytical function we may take the modulus |a| of 
the observable a. This is defined by 


|a| Vp = lap] Vp 


and is quite a proper observable, in spite of the fact that the corresponding function 
of a real variable is discontinuous, and may be used freely in the analysis when 
desired. 


17. Simultaneous Eigenstates 


A state w may be simultaneously an eigenstate of two observables a and £, 
i.e. it may satisfy both 


aw = aw 
and Bw = bw, 
where a and b are numbers. We should then have 
apy = abp = Bay 
or (ab — Ba)w =0. 


This suggests that the chances for the existence of a simultaneous eigenstate of two 
observables a and 3 are most favourable when (a3 — Ga)w = 0, t.e. when a and 
8 commute. When a and { do not commute the possibility for the existence of 
a simultaneous eigenstate is not absolutely ruled out, but the occurrence of such 
a state is exceptional. On the other hand, when a and 6 commute there exist 
so many simultaneous eigenstates, that, as will now be proved, an arbitrary state 
can be expanded in terms of them. We thus get a generalization of the expansion 
theorem of §14. 

Let a and (6 be two observables that commute and let w,, be an eigen-w of a 
belonging to the eigenvalue a. By the expansion theorem of §14 we can expand w, 
in terms of eigen-w’s of 3, thus 


Wa — So vn; (16) 
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where wy, is an eigen-w of 6 belonging to the eigenvalue b. It will now be proved 
that each y, in this expansion is an eigen-w also of a and is thus a simultaneous 
eigen-w of a and @. If f(3) is any function of the observable 3, we have 


af (Bva=aS> f(B)vr 
=a S f(d)vo 


from the definition of a function given in §15. Now from a theorem of §15, since a 
commutes with 2 it must also commute with f(@), so that of 


af (Ba = f(B)abs = f(B)ava 
=af(8) Sow =a)- f(b)v. 


Hence aS” fOr =aS > fd)». (17) 


b 


Now f(6) is an arbitrary function of the real variable b, so that for each value of b 
in the domain of b, f(b) is an arbitrary number. Hence we can equate coefficients 
of f(b) in (17), which gives 

avy = ayo. 


Thus each of the v»’s in the expansion (16) is an eigen-w of a belonging to 
the same eigenvalue a as that of our original w, and is thus a simultaneous eigen-w 
of a and G. Any eigen-w w, of a can therefore be expanded in terms of these 
simultaneous eigen-w’s. But an arbitrary w can be expanded in terms of ~,’s, and 
hence an arbitrary ~ can be expanded in terms of simultaneous eigen-w’s. 

The converse theorem is also easily proved, namely, if two observables a and 3 
are such that an arbitrary w can be expanded in terms of the simultaneous eigen-w ’s 
of a and 8, then a and 8 commute. We have, in fact, if wp is a simultaneous eigen- 
w of a and 6 belonging to the eigenvalues a and b respectively, the equation 


(a8 — Ba)Wa = (ab — ba)tay = 0. 
Hence (ab — Ba) =0, 


where w is any w-symbol that can be expanded in terms of the w,,’s. If this is true 
for an arbitrary w, we can infer that 


aB — Ba=0, 


as required. 
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The idea of simultaneous eigen-w’s may obviously be extended to more than 
two observables and the theorem just proved still holds, 7.e. an arbitrary ~ can 
be expanded in terms of the simultaneous eigen-w’s of any set of observables 
that commute and also its converse. The same arguments used for the proof 
in the case of two observables are adequate for the general case, e.g. if we have 
three observables a, 0, y that commute, each with the other two, we can expand 
any simultaneous eigen-w of a and ( in terms of eigen-w’s of y and then show that 
each of these eigen-w’s of 7 is also an eigen-w of a and £. 

The fact that there is an expansion theorem for two or more observables that 
commute, the same as that for a single observable, means that a set of two or more 
observables that commute has many of the properties of a single observable and can 
for many purposes be counted as a single observable, the result of a measurement 
of which is expressible by two or more numbers. Thus the theory of functions of 
a single observable developed in 815 can be applied without change to functions 
of two or more observables that commute. If a, 6, y,... are a set of observables 
that commute, we define a general function of them, f(a, 3,y,...), by 


f(a, OD; Voce Vader = f(a, b, Cy... Ame 


where Wate... a simultaneous eigen-w of a, 6, y,... belonging to the eigenvalues 
a,b,c,... respectively, and f(a, b,c,...) is a function of the real variables a, b,c,... 
whose domains consist of the eigenvalues of a, 6, y,... respectively. The theorems 


given in §15 about functions of single observables will apply also to functions of sets 
of observables that commute, the proofs being formally equivalent in the two cases. 
For example, we shall have the theorem that any observable that commutes with 
each of a set of commuting observables a, 6, y,... will commute also with any 
function of them, f(a, ,7,...). 

If we take the maximum possible number of independent observables that 
commute, the condition of independence being that no one of them can be 
expressed as a function of the others, then there cannot be more than one 
simultaneous eigenstate for them all belonging to a specified set of eigenvalues. 
To prove this result, let a, be the set of commuting observables and suppose there 
are two independent simultaneous eigen-w’s, w, and we, of all the a,’s belonging 
to the same set of eigenvalues. Introduce the new observable ( defined by 


Buy = V1, Be = 0, bys = 0, 


whenever w3 is a simultaneous eigen-w belonging to a different set of eigenvalues. 
Then this @ commutes with all the a’s and also it is not a function of them, as may 
be seen from the fact that any linear combination of w=, and wW 2 is a simultaneous 
eigen-w of all the a’s but is not an eigen-w of 8, so that the set of a’s does not 
contain the maximum possible number of independent commuting observables. 
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Hence, when the set of a’s does satisfy the given conditions, each eigenstate must 
be uniquely determined by the eigenvalues to which it belongs. Such a set we call 
a complete set of commuting observables. 


18. Some Probability Theorems 


We shall now determine the probability of a given result being obtained when 
an observation is made on the system in a given state. For this purpose the only 
physical assumption we shall make use of is that given in 811 for the average 
value of an observable. To determine the probability that an observable shall be 
found to have the value a when a measurement of it is made for the system in 
a state w, we use the fact that if a measurement is made of f(a), any function 
of a, the average result obtained will be 


oflayy, 


where @ is the conjugate imaginary of w, provided @ and w are normalized. 
Suppose @ and w to be expanded in terms of eigen-@’s and eigen-w’s, thus 


=Siba b= > de, (18) 


where ¢, belongs to the eigenvalue a and w, to a’. The expression for the average 
of f(a) now becomes 


Saf (a) > bar = S> fladatba 


a,a’ 


=S> f(a@datha, (19) 


when we use the theorem of §13 that eigenstates belonging to different eigenvalues 
are orthogonal. Now if P(a) is the probability of the observable a being 
found to have the value a, the average value of f(a) must be 5°, f(a)P(a), 
since the ordinary probability rules will apply in this case. Equating this expression 
to (19), we find 


S~ f(a)P(a) = J fla)oarba 


This holds when f(a) is an arbitrary function of the real variable a, so that we must 
be able to equate coefficients of f(a), which gives 


P(a) = bata. (20) 
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We can easily verify that this expression for P(a) gives unity for the total 
probability of a having any value, since from the normalizing condition for @ 


and w we find 
> be >) te =1, 


> bata = 1. 


We can put the expression (20) in a different form by inserting numerical 
coefficients in the expansions (18) so that they read 


C=) Ghai b= So caba'; 


and taking the ¢,’s and y,’s to be normalized. We then get for P(a) 


which reduces to 


P(a) = TabaCatla = |cal” 


so that the probability of a having any given value is equal to the square of 
the modulus of the corresponding coefficient in the expansion. 

From this it follows at once that if the state ~ is an eigenstate belonging 
to the eigenvalue a, the probability of a having the value a is unity. Thus the 
result that if aw = aw, a certainly has the value a for the state w, is deducible 
from the general assumption for the average value of an observable. A second 
immediate consequence is that any result, a say, for an observation of a on 
the system in the state cjwW, + cow has a finite probability of being the result 
when this observation is made for either state w, or state wo, since if the term 
belonging to the eigenvalue a in the expansion of cywW, + cow in eigen-w’s of a 
does not vanish, that in the expansion either of 7, or of Wz must also not vanish. 
This shows that the definition of superposition given in §6 is equivalent to that 
contained in the symbolic algebra, together with the interpretation of this algebra 
that daw is the average of a. 

The results we have just obtained all remain true when we replace 
the observable a by a set of two or more observables that commute, the proofs 
being formally unaltered. Thus, we shall have that if ~ is expanded in terms of 
simultaneous eigen-w’s of two observables, a and 6, that commute, i.e. 


v= So bap 
a,b 


where w,» is a simultaneous eigen-w, belonging to the eigenvalues a and 6 for a 
and § respectively, then the probability that the results and 6 shall be obtained 
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from measurements of a and 6 for the state ~ will be égyxWq» when w is normalized. 
The existence of a definite probability for these results, independent of the order 
in which the observations are made, requires that the observations shall not 
interfere with each other and suggests that the condition that two observables 
commute is equivalent to the condition that the two observations are compatible. 
A formal proof of this will now be given. Before we can do this we must 
obtain a mathematical form for the condition that an observation is made with 
the minimum of disturbance, which we have hitherto discussed only qualitatively. 

Consider an observation, consisting of the measurement of an observable a, 
to be made on asystem in the state ~. The state of the system after the observation 
must be an eigenstate of a, since the result of a measurement of a for this state 
must be a certainty. Now suppose the observation to be made in such a way that 
the state of the system afterwards is always one of those that occur in the expansion 
of the initial w in terms of eigen-w’s of a, t.e. one of the w,’s in 


b= dove 


This is permissible since there is one eigen-w w, in the expansion for every 
eigenvalue a that has a finite probability of being the result of the observation. 
This observation of a may then conveniently be defined to be the one that causes 
the minimum of disturbance to the system. Observations that cause the minimum 
disturbance are thus those with the property that, by a superposition of all 
the possible states after the observation, the state before the observation may be 
formed, or those with the property that any result that can be obtained from any 
observation on the system in the initial state is a possible result when the same 
observation is made on the system in one of the final states. It is observations 
with this property that should be understood in the discussion on compatibility 
in §4. Granting the existence of observations with this property, there is a physical 
necessity for the expansion theorem of 814. 

Now let a@ and ( be two observables that commute and let any state w~ be 
expanded in terms of simultaneous eigen-ws’s of a and 3, thus 


w = S- Pap: 
a,b 
The expansion of w in terms of eigen-w’s of a must then be 
w = S- Wa (21) 
where Va = S- Wabs (22) 
b 
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and similarly the expansion of w in terms of eigen-w’s of 6 must be 
y= > de; (23) 
b 
where W = So bao: (24) 


The suffixes in each case denote the corresponding eigenvalues. If w is 
normalized, then the probability for this state of the result b being obtained 
from a measurement of 6 will be @yv». When this result is obtained, the state 
of the system after the observation will be w», if the observation is made with 
the minimum of disturbance according to the above definition. If an observation is 
now made of a for this final state vy, the probability of the result a being obtained 
will be, from (24), 


ParPab/ Por, 


the denominator arising from the fact that the symbol 7% is not normalized. 
Thus the probability of first the result b being obtained for 6 and then the result a 
for a will be, by multiplication, @g,Wqp. The total probability of the result a being 
obtained for the second observation with any result for the first must therefore be 


Ss" PabWab- 
b 


If, now, an observation of a were made on the system in the initial state w, with 
no observation at all of 3, the probability of the result a being obtained would be, 
from (21), daWa. On account of (22), this must equal 


S° Pab S- Wav! = » PabWab; 
b b! b 


from the orthogonality theorem of 812, which is the same as the probability 
that the result a shall be obtained for a after an observation of 6. This is just 
the condition that a and ( shall be compatible according to §4. 

The converse will now be proved, that if the measurements of two observables 
qa and 6 are two compatible observations, then a and @ commute. It was shown in 
§4 that if the compatible observations a and ( are both made on the system in any 
state w, the final state will be such that the result for either observation with this 
state will be a certainty, 7.e. the final state will be a simultaneous eigenstate for a 
and (. If the observations are made with the minimum of disturbance according 
to the above definition, then the initial state w must be capable of being expanded 
in terms of all the possible final states. Thus an arbitrary w can be expanded in 
terms of simultaneous eigen-w’s of a and 3, so that a and 8 must commute. 
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The identification of the condition of commutability of observables with that 
of the compatibility of the observations allows us to see a physical necessity for 
the theorem of §15 that any observable that commutes with an observable a 
commutes also with f(a), any function of a. This theorem may now be stated 
in the form that any observation that is compatible with the observation of a is 
compatible also with the observation of f(a) and is thus physically obvious, since 
any observation of a is in the fact itself also an observation of f(a). 

It will now be shown that the fact that the probability of agreement of two 
states wy and ¢2 is dow], when wW and ¢2 are normalized, is deducible from 
the general assumption for the average of an observable. It has been shown that 
from this general assumption one can deduce that the probability of an observable 
a having the value a for the state 7, is |c|*, where c, is the coefficient of the eigen-~ 
belonging to the eigenvalue a in the expansion of w, in terms of eigen-w’s of a, 


VW —~ Di Cava (25) 


when ¥,, and all the ws are normalized. This result is still true when a 
denotes a set of commuting observables a, and w, is a simultaneous eigen-w 
belonging to the set of eigenvalues a,. There is one maximum observation, 
the result of which for the state wo is a certainty. This maximum observation 
will consist in the measurement of a set of commuting observables a,, which 
set must be a complete set, in the sense defined at the end of the preceding 
section, if the observation is really a maximum one. The state wW2 is then 
a simultaneous eigen-w of all these observables a, and there is no other 
simultaneous eigen-w belonging to the same set of eigenvalues as wp does. 
That term in the expansion (25) which belongs to the same set of eigenvalues 
as Ww. must therefore be just ~ itself or differ from it by a trivial numerical factor. 
The probability of agreement of , with 2, which is the probability that the result 
of the observation of the a’s for state vy, is the same as for state wo, is therefore 
|ca2|*, where Ca is the coefficient of that v,, in (25) that is just v. But from 
the orthogonality theorem, one finds that do, is equal to just this coefficient cg, 
so that the probability of agreement is loowr |”. 


19. Contact Transformations 


The folowing important theorem in the theory of eigenvalues will now be proved. 
If S is any observable having a reciprocal S~' and a is any observable, then the 
eigenvalues of SaS~+ are the same as those of a. Let a be any eigenvalue of a 
and let w, be an eigen-w of a belonging to it, so that 


AWg = ang. 
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This gives SaS'Su_ = Sata = Say = aS Wo. 


Hence Sw, is an eigen-~ of SaS~!' belonging to the eigenvalue a. Conversely, as 
may be shown in a similar way, if a is any eigenvalue of Sa.S~! and w is an eigen-7) 
of SaS~! belonging to it, then a is also an eigenvalue of a and S~'w is an eigen-1 
of a belonging to it. 

It is not necessary for this theorem that S should be a real observable. If S' is 
not real we cannot use the general definition of a function of an observable in order 
to define S~', but must use instead the conditions SS~' = S~!S = 1, which are 
sufficient for the proof of the theorem. S can be any observable such that there 
exists an S~! satisfying these conditions. It is also not necessary for the theorem 
to be true that a should be a real observable, but since the only eigenvalues of 
interest in quantum mechanics are those of real observables, the theorem is useful 
only when both a and SaS~! are real. This imposes a condition on S. If SaS7! 
is to be real whenever a is real, we must have, from the rule (22) of §10, 





SaS = SaS-! = S-laS = S-laS 
which requires, ignoring possible trivial numerical factors, 
Sas S.. * ie Gee. 


Either of these conditions is a consequence of the other. 

When S satisfies these conditions, the transformation from a set. of observables 
a, to the set 6, = Sa,S~' is called a contact transformation of observables, 
since, as we shall see later, it is analogous to a contact transformation of 
classical mechanics. Each of the new observables 6, has the same eigenvalues 
as the corresponding original one a,. Further, the transformation has other 
remarkable properties, namely, if any algebraic relation holds between some of 
the a’s, the same relation holds between the corresponding 3’s, and if one of the a’s 
is a function of another one according to the general definition, the same functional 
relation holds between the corresponding (3s. 

To prove the first of these two properties, we observe that any algebraic relation 
between the a’s may be written in a rational integral form of the type 


y CO Og. 0g =O; 


the summation consisting of an arbitrary number of terms, each consisting of 
an arbitrary number of factors, and the c’s being arbitrary numerical coefficients. 
From this we deduce, by multiplying by S on the left and S~' on the right, 
the result 


SS COO lg tan aS '=0 
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or S| cSapS5*Sa,5 ...Sa,S-! = 0 
or , Cplemchse=0 
which is the result required. 
To prove the second of these properties, suppose a2 = f(a ), where f(a) is 
a function of the real variable a defined for each of the eigenvalues of a;. Since these 


are also the eigenvalues of 3,, we can give a meaning to f(3,). Let w, be an eigen-w 
of a1, belonging to the eigenvalue a. We then have 


f(oi)Ya = f(a)va- (26) 


But Sw, must be an eigen-~) of Sa,S~+ or 8, belonging to the eigenvalue a of 34, 
so that we must also have 


f(Bi) Sta = f(a) Sta. (27) 
Multiplying (26) by S on the left, we obtain 
Sf(ai)S "Sa = Sf(aa 
= F(A) Sa (28) 


from (27). Now Sw, is an arbitrary eigen-w of 61, so that any w can be expanded 
in terms of Sw,’s. Hence we can equate coefficients of Sq in (28), which gives 


f (G1) = Sf(a1)S~* = SaaS" = Ba 


as required. 

If two contact transformations are applied successively, the result is another 
contact transformation. To see how this comes about, consider the transformation 
6, = Sa,S~' from the a’s to the 8’s and the transformation y, = T8,T~! from 
the 6’s to the y’s. We have then 


= TSes i 


Now (FS) Sr St 
and (ST-L(7S) =1, 
so that we can put Sor = (0S) 5 


The connexion betwcen the a’s and y’s now becomes 
A = TS aA ES) 5 


which is a contact transformation. 
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If the observable S in the transformation G = SaS~! differs from unity only 
by an infinitesimal, we get an infinitesimal contact transformation. Suppose 


o=1+2A, 


where A is very small, so that its square may be neglected. (A small observable 
is one whose eigenvalues are all small, or whose average for any state is small.) 
We then have 

SS Stag: 


since this gives SS~! = $~'S = 1 with neglect of A2_ The transformation equation 
now becomes 

GNMed= ia), 
which gives B-a=t(Aa— aA), (29) 


with neglect of A% This is the standard form for an infinitesimal contact 
transformation. In order that @ — a may be a real observable when aq is real, 
A must be a real observable. 

As an example of contact transformation theory, we shall obtain some more 
information about the observables p and q of §12, satisfying equation (26) of that 
section. We apply the theorem that p has the same eigenvalues as SpS~1, taking 
for S the expression e', where c is a real number, which makes S~! = S. We now 
find 

Sps! = ene 14 = (p _ cette 14 =—p-c, 
with the help of equation (28) of §12. Thus p has the same eigenvalues as p — c, 
which are just c less than those of p, so that if a is any eigenvalue of p, a — c must 
be another. This is true for arbitrary c, so that p must have as eigenvalues all 
numbers from —oo to +o0. Similarly it may be proved that q has as eigenvalues 
all numbers from —oo to +oo. These results are necessary consequences of the 
single algebraic condition gp — pq = 1. 


IV. REPRESENTATIONS OF 
STATES AND OBSERVABLES 


20. General Properties 


IN the two preceding chapters we dealt with certain abstract symbols, denoting 
states and observables, whose exact nature was not specified, but which were 
assumed to obey certain definite laws. In the present chapter we shall 
consider representations of these abstract symbols, z.e. sets of numbers having 
properties that correspond completely to those of the symbols they represent. 
When once one has found such a representation and has understood the nature of 
the correspondence, one can obtain all the properties of the abstract symbols that 
one wants by dealing entirely with their representatives, to which, since they are 
just sets of numbers, ordinary mathematical methods apply. One cannot, of course, 
obtain in this way any relation between the abstract symbols that one could 
not obtain directly from the algebra of the abstract symbols without the help 
of a representation. One can, however, often obtain results much more easily 
and conveniently with the help of a representation than without it, and further 
the numbers occurring in a representation have often a very direct physical 
interpretation, so that representations are of great use in applications of the theory. 

Suppose we have a complete set of independent w’s, the general member of 
the set being denoted by w,. The fact that the set is complete means that every 
w can be expressed as a sum of members of the set in the form 


v= S- aptby, (1) 


where the coefficients a, are numbers. The fact that the members of the set 
are independent requires that an expansion of the form (1) is unique, 
since if an alternative expansion 


b= > a 
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were possible, we could obtain by subtraction 


0 =, S (a ae a) Pp, 


P 


which can be true with independent w,'s only if a, = ay, for all p. Thus according 
to (1) each w determines uniquely a set of numbers a, and, conversely, each set of 
arbitrary numbers a, determines a w. There is a one-one correspondence between 
the w’s and the sets of numbers ap. 

If W_ corresponds to the set of numbers a, and 7, to the set b,, we have 


Wa aa Ss Gg ys y = Ss DsWs, 
Pp p 


and hence Wa t+ Uo = PaCP + bp) p, 


Pp 


so that Wa + Y corresponds to the set (a, + 6,). Also, if c is any number, 
CW corresponds to the set cap). Thus all the properties of the w’s of addition 
and multiplication by numbers are possessed also by the sets of numbers a, 
corresponding to them. The sets of numbers thus form a representation of 
the ~’s, each ~ being represented by one set a, defined by (1). The w,’s will be 
referred to as the fundamental w’s of the representation. If we take a different 
set of fundamental w’s, we shall get a different set of numbers to represent 
each w, so that we shall get a new representation. There is one representation 
for each complete set of independent w’s, since they may always be taken as 
fundamental w’s. In the vector picture of the w’s the numbers representing any 
w are its co-ordinates relative to certain axes (which may be oblique), which 
are determined by the fundamental w’s. The different representations are then 
the co-ordinates referred to different axes. A state is defined by the ratios of a set 
of numbers a, to each other, since a ~ can be multiplied by an arbitrary number 
and will still represent the same state. 

We shall now consider how an observable a is to be represented. If wy is any 
fundamental w of a representation of ~’s, we can form the product aw, and expand 
it in terms of the fundamental ~’s in the form (1) thus 


apg = So bptipa: (2) 
Pp 


where the coefficients apg are numbers, which depend of course, as the notation 
implies, on the suffix q of the ~ on the left-hand side. We have put the coefficients 
Qpq in (2) on the right-hand side of their respective ~, 7s, instead of following 
the usual practice of putting coefficients on the left, so that the order of the two 


54 IV. REPRESENTATIONS OF STATES AND OBSERVABLES 


suffixes may be more easily remembered. That suffix of a,, which is nearer to 
the w is the same as the suffix of the ~. This is an example of a rule which will 
be used very extensively in the future. 

Each observable a determines uniquely through equation (2) a set of 
numbers Q, . Conversely, each set of numbers a,, determines an observable a. 
There is thus a one-one correspondence between observables a and sets of 
numbers a, . These sets of numbers are the representatives of the observables. 
The correspondence between the properties of the sets of numbers and those of 
the observables will now be investigated. 

Each set of numbers representing an observable is twofold, on account of the two 
suffixes, and may most conveniently be written as a matrix array, each number 
Qpq Of the set being the element of the matrix in the p-th row and q-th column. 
Thus each observable is represented by a matrix. The number of rows and columns 
of the matrices is equal to the number of fundamental w’s of the representation 
and one row and one column correspond to each fundamental ~. A row and 
column that correspond to the same fundamental w~ correspond to one another. 
An element of the matrix that lies in a row and column corresponding to one 
another, 7.e. an element of the type app, is called a diagonal element, since all such 
elements lie on a diagonal of the matrix when the rows and columns are arranged 
both in the same order. 

If an observable a is represented by the matrix a,,, and an observable 6 by 
the matrix £,,, then it is easily verified that the observable a + ( is represented 
by the matrix a, + 3p,, and the observable ca, where c is a number, by Ccdpq- 
These results rnay be expressed in symbols by the equations 


(a+ B)pq = Opq + Poa, (3) 
(CO) ng = COpg, (4) 
which are the ordinary rules for the addition of matrices and for the multiplication 


of matrices by numbers. Again, if the product af is represented by the matrix 
(A3)pq, We have by definition 


(a8)pq = D tel (2.8) pg (5) 
But we have also (aB)b, = a Bal,) =a DbrBra 
= = 2 apr) C4 2, Yeeerbrs (6) 


By equating the coefficients of w, in the cen sides of (5) and (6), which is 
permissible since the w’s are all independent, we obtain 


(28 )pq = = Do Oorbra (7) 
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Thus the matrix representing a3 equals the matrix representing a multiplied 
by the matrix representing 8, according to the rule for matrix multiplication. 
The particular arrangement of the suffixes of a,, chosen in the defining equation (2) 
is necessary in order that this rule of matrix multiplication may hold. If instead 
of (2) we had put 


ag = me QapYp, 
P 
we should have found for the law of multiplication 


(8) nq = S- OrgBor (8) 


which is not so convenient as (7). 

Equations (3), (4) and (7) show that the properties of observables of 
addition and multiplication are all faithfully reproduced by the properties of 
the matrices representing them, and justify our saying that the matrices do 
represent them. Matrices, like observables, satisfy all the laws of ordinary algebra 
except the commutative law of multiplication. 

It has been mentioned that a number may be regarded as a special case of 
an observable. The matrix representing a number c has its elements cpg defined by 


Cg = SS YpCpa, 
P 


which gives Cape Gos 0,, (Vea): 


Thus the matrix representing c is a diagonal matrix, i.e. all its elements vanish 
except the diagonal ones, and further all the diagonal elements are equal to c. 
We can put 

Cpq = COpq; 


where the symbol dp, is defined by 
Opp = 1; Ops = 0, (p # q): (9) 


The numbers 0,, are the elements of the matrix representing unity, which matrix 
has the property that it leaves unchanged any matrix when multiplied into it on 
either the left- or right-hand side. 

We shall now obtain the law of multiplication of the representatives of 
an observable and a w-symbol. Let w be represented by the set of numbers ay, 
as according to (1), and let the ¢#-symbol aw, where a is any observable, be 
represented by the set of numbers 6,, so that 


ab = So dade. 
q 
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We have from (1) ay = S- UpAp 
Pp 


= S° WqXqp4p- 
Pq 
Hence, equating coefficients of w,, we get 
b= S- QgpAp, (10) 
Pp 


which is the required multiplication law. It suggests that we should regard the set 
of numbers a, as a matrix, having rows corresponding to the various fundamental 
w’s of the representation, but having only one column. Equation (10) would then 
be the law of multiplication of such a matrix with a square matrix gp. 

The correspondence that we have found between the properties of observables 
and w-symbols and those of their representatives, which is exemplified in 
equations (3), (4), (7) & (10), allows us to take over any equation between 
the abstract symbols into an equation between the representatives. Suppose, 
for instance, that we are given the equation 


aby = yy! +p", (11) 


where a, 3 and y are three observables and w, vw’ and wy” are three states. 
By equating the representatives of each side of this equation, making use of 
the law (10), we obtain 


SK (QB) pq@p = Dar + ay 
Pp 
where ap, a), and ay represent , ~ and w" respectively. From (7) we now get 


= ! " 
y gr BrpAp = y apy + Aq. 
pr Pp 


Each symbol in the original equation (11) is here replaced by its representative, 
occurring in the corresponding position. The suffixes are arranged according to 
very simple and easily remembered rules, each consecutive pair of factors in any 
term having a common suffix, the two positions of this suffix being consecutive in 
the scheme of suffixes, while the suffix that occurs first in any term is the same for 
every term. A summation is taken over each suffix that occurs twice in a term. 
As examples of equations that can be taken over in this way may be mentioned 
any of the equations between the abstract symbols occurring in the theory of 
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eigenvalues of the preceding chapter. Equation (1) of that chapter, for instance, 


gives 
y Apgdg = AAp. 
qd 


If the matrix Qpq is known, then we have here an ordinary set of simultaneous 
algebraic equations for the unknowns a, and also the unknown a. Any value of a 
for which these equations have a solution (not identically zero) may be called 
an eigenvalue of the matrix a,,. If we eliminate the unknowns a,, in which 
the equations are linear and homogeneous, we get the determinantal equation 


Qi, — a Q12 13 
21 22 — a 23 
31 32 33 — a 








to determine the eigenvalues a. The eigenvalues of a matrix representing 
an observable must, of course, be the same as the eigenvalues of the observable 
itself. 


21. Orthogonal Representations 


We have not yet considered how ¢-symbols are to be represented. We can always 
treat @’s analogously to w’s, so that we can take any complete set of independent 
@’s, dp say, and call them the fundamental @’s of a representation. If we then 
expand an arbitrary ¢ in terms of them, thus 


@ = Sahn, (13) 


the set of numbers a; will form the representative of this ¢. Again if a is any 
observable, we can multiply it into a fundamental ¢, ¢,, obtaining a product ¢,q, 
which we can expand in terms of the fundamental @’s, thus 


Ppa = Ss" pg Pq: (14) 


The coefficients ap, will then form the matrix that represents a. It may easily be 
verified that the matrix laws of addition and multiplication, equations (3), (4) and 
(7), hold also for the representatives of observables in the present representation 
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in terms of fundamental @’s. It should be noticed that the arrangement of 
the suffixes in ap, requires the coefficients on the right-hand side of (14) to occur on 
the left of their respective ¢-symbols, the opposite to what it was for equation (2). 
The particular arrangement of suffixes chosen in (14), like that chosen in (2), 
is necessary in order that we may have the multiplication law (7), which obeys 
the suffix rule, instead of the multiplication law (8). 

We can in this way get a representation of observables on the basis either of 
a set of fundamental @’s or of a set of fundamental w’s. The question now arises 
whether a set of fundamental @’s and a set of fundamental ~’s can be such that 
they both give the same representative for each observable. If this is so, we could 
count them both as belonging to the same representation, so that we should 
have one representation comprising representatives of both @’s and w’s as well 
as observables. A necessary condition for the fundamental ¢’s and fundamental 
w’s to give the same representatives for observables is that they shall be labelled 
by the same set of suffixes p,q,r,..., which suffixes will then label the rows and 
columns of the matrices. Thus to each fundamental w there will be a corresponding 
fundamental ¢ having the same suffix. According to the notation that we have 
used hitherto, when a w and a @ have the same suffix they are conjugate imaginary 
symbols denoting the same state, but this will now no longer hold. 

We have already used the same suffixes for the fundamental ¢’s in (14) as 
for the fundamental ~’s in (2), so that we can investigate the consequences of 
these equations on the assumption that the coefficients a, are the same in each, 
for every observable a. If in (14) we change the summed suffix q to r and then 
multiply by q, on the right, we obtain 


Ppag = S- Apr OrWq: (15) 


Similarly, if in (2) we change the summed suffix p to r and then multiply by ¢, on 
the left, we obtain 


PpaWg = S PpUrArp- (16) 


The right-hand sides of equations (15) and (16) can be equal for an arbitrary 
observable a, i.e. for arbitrary apqs, only provided 


dry =0 (DF Qq) (17) 
and PpWq =C, 


where c is a number independent of p. We may without essential loss of generality 
take c= 1, so that we have 


Ppp =1. (18) 
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Equations (17) and (18) can be combined in the single equation 
bpq = 9pq- (19) 


This is the condition that a set of fundamental @’s and a set of fundamental ~’s 
may both be considered as belonging to one representation. 

With the help of these conditions we can easily obtain explicit expressions for 
the coefficients in the expansions. Thus to determine the coefficients a, occurring 
in (1) for the expansion of an arbitrary w we have 


bat = bq >, Mr = >, ApSpq 
Pp p 
Gy: (20) 
Similarly the general coefficient a} in (13) for the expansion of an arbitrary ¢ is 
ay = Ov: (21) 


Again, from (14) we obtain 
PpQpy = ys ApgPqr 
q 
= iy (22) 


which gives explicitly the elements of the matrix representing any observable. 
This result could also have been obtained from (2). 

In obtaining a general representation for both @’s and w’s as well as observables, 
we have had to abandon the notation of a @ and w which have the same suffix 
being conjugate imaginary symbols denoting the same state, and this results in 
the representation being inconvenient and not very useful. The fundamental @’s 
and w’s may, however, be such that each fundamental ¢@ and w with the same 
suffix are really conjugate imaginary symbols denoting the same state, in which 
special case there is no need to abandon this notation. Such a representation is 
a particularly useful one. It is called an orthogonal representation. The set of 
states denoted either by the fundamental ¢’s or by the fundamental ~w’s may be 
called the fundamental states of the representation. The condition (17) shows that 
these fundamental states are all orthogonal to each other and condition (18) shows 
that the ¢’s and w’s representing them are normalized. 

The vector picture of @’s and w’s provides us with a simple geometrical 
interpretation of an orthogonal representation. In this vector picture each 
o-symbol and the conjugate imaginary w-symbol are to be pictured as conjugate 
complex vectors. We can without inconsistency suppose that each fundamental @ 


60 IV. REPRESENTATIONS OF STATES AND OBSERVABLES 


and the conjugate imaginary fundamental w~ of an orthogonal representation are 
to be both pictured by the same real vector. Condition (17) now shows that 
these real vectors are all mutually perpendicular and condition (18) that they 
are each of unit length, so that they form the basis for a rectangular Cartesian 
system of co-ordinates. The numbers representing an arbitrary @ or w~ are now 
its co-ordinates in this system. Since the system of co-ordinates is real, a @ and 
the conjugate imaginary w, pictured as conjugate complex vectors, should have 
conjugate complex co-ordinates, and thus they should be represented by conjugate 
complex sets of numbers. It is easily verified, by comparing equations (20) and (21), 
that this is the case. Thus a state is represented by the same set of numbers 
whether it is denoted by a ¢@ or a v, apart from an uncertainty in the sign of 7. 

If a is a real observable, then from equation (22) we find that the elements of 
the matrix representing it satisfy 





Qpr = Arp 


in the case of an orthogonal representation. A matrix for which this condition 
holds is called Hermitian. If in addition all the matrix elements are real, we have 
Apr = App, i.e. the matrix is symmetrical. From (22) we also find that a diagonal 
element a,, is equal to the average value, according to §11, of the observable for 
the corresponding fundamental state ~,. If a is not a real observable, then its 
conjugate complex observable a, defined in 810, has matrix elements to represent 
it, given by 

Qin Ong: (23) 


The matrix @,, may be called the conjugate complex matrix to Qp,. 


22. The 6 Function 


We have assumed throughout the above investigation of representations that 
the number of fundamental w’s, if not finite, is at most infinite enumerable, so that 
each of them can be labelled by a suffix p taking only a discrete set of values. 
For most dynamical systems of interest this condition is not fulfilled, the total 
number of independent states being infinite and equal to the number of points on 
a line. In such cases we must label each of the fundamental w’s by a suffix p that 
can assume any value in a certain range. The condition (1), which expresses that 
any w is a linear function of the fundamental w’s, must now be rewritten with 
an integral instead of a sum, thus 


— [ony dp. (24) 
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The domain of integration is to be understood to be the whole range of p used for 
labelling fundamental w’s. The coefficients a, form a function of the continuous 
variable p. 

It is not strictly true that every ~ can be expressed in the form (24) when 
the coefficients a, are restricted to be finite, which is, of course, implied when 
one says they form a function of the continuous variable p. An example of 
a wy that cannot be expressed in this form is one of the fundamental w’s, w, 
say, itself. Another example is 0W,/Oq when w, involves the parameter qg in 
a manner sufficiently continuous for this differential coefficient to exist. It would 
be inconvenient if throughout the subsequent theory we were continually being 
reminded of the fact that there are exceptional W’s which cannot be expressed in 
the form (24). We get over the difficulty by allowing infinities of certain types 
to occur in the coefficients a,, which enables every ~ formally to be expressed in 
the required form. This is analogous to the device sometimes used in geometry, 
of avoiding the exception of parallel lines to the rule that two straight lines always 
meet in one point, by saying that parallel lines meet in a point at infinity. 

We observe that those that are not of the form of the right-hand side of (24) 
with finite a, can always be regarded as limits of ~’s that are of this form. We can, 
for instance, express q, by 


lim. Gra 0). (orp 9): 


As one approaches the limit, a,,, becomes a function of p which vanishes for all 
values of p except those very close to gq and which is so large for values of p in 
the immediate neighbourhood of q that its integral is unity. We can now say 
formally that 


y= f ag dp, (24 
where Ap = im Oigae 


This a,, we can say, is an improper function of the variable p, having the value 
zero for all values of p except q and the value infinity for p = q, the infinity being 
such that its integral is unity. It is thus a function of the two variables p and q 
which depends only on their difference, so that we can put 


dp = 6(p — q); (25) 
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where the improper function 6(x) is defined by 


ae. Ole) de 1 
O(a) = 0" (for a0). 


The introduction of the 6 function into our analysis will not be in itself 
a source of lack of rigour in the theory since any equation involving the 6 
function can be transcribed into an equivalent but usually more cumbersome 
form in which the 6 function does not appear. The 6 function is thus merely 
a convenient notation. The only lack of rigour in the theory arises from the fact 
that we perform operations on the abstract symbols, such as differentiation and 
integration with respect to parameters occurring in them, which are not rigorously 
defined. When these operations are permissible, the 6 function may be used freely 
for dealing with the representatives of the abstract symbols, as though it were 
a continuous function, without leading to incorrect results. We can, in fact, even 
give a meaning to the 6 function of an observable, provided it has a continuous 
range of eigenvalues, by means of the general definition of §15. 

Certain elementary properties of the 6 function, which are deducible from, or at 
least consistent with, the definition, should be noted, namely, 


6(—x) = 6(z), 
Loe) = 0 (26) 
and / f(x)d(a —a) dx = f(a), (27) 


where f(x) is any continuous function of x and a is any number, and the range 
of integration is any range through the point a, the limits oo and —oo being put 
down merely for definiteness. Thus the operation of multiplying by 6(a — a) and 
integrating with respect to x is equivalent to the operation of substituting a for x. 
This is still true when the operation is applied, not to an ordinary function f(z) 
of x, but to a w-symbol or an observable involving the parameter x, provided it is 
reasonably continuous in x. We are, in fact, making an application of this rule, 
with the w-symbol w, for f(a) and the number q for a, when we assert that (24’) 
holds with a, defined by (25). A further property of the 6 function is 


ss d(a— 2x) dx d(x — b) = d(a— b). (28) 


To prove this relation we regard the left-hand side as a function of the number b 
and put it equal to F'(b). We see at once that F'(b) = 0 if b is not equal to a, and 
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[ Pow=f sa-a)ae f oe—Har 


=i d(a—x)dx =1. 


(oe) 


also we have 


Thus F'(b) satisfies all the conditions that define 6(b — a) and may hence be put 
equal to 6(b — a) or d(a — b). Equation (28) would have been obtained from 
equation (27) if for f(a) we had substituted the improper function 6(a—b). This is 
an example which illustrates how a 6 function may be used as though it were a 
continuous function without leading to incorrect results. 

In order to put Ov,/0q in the form of the right-hand side of (21) it is necessary 
to use the derivative 6’(a) of the function 6(a). This derivative is, of course, an even 
more discontinuous and improper function than d(x) itself, but in many cases it 
can also be used freely as though it were a continuous function of x without leading 
to incorrect results. It has the elementary properties 


5'(—2) = —8(a), 

xd (x) = —6(z) (29) 
and | f@se- a) dz =-F'@, (30) 
for any differentiable function of +, which may be a w-symbol or observable 
involving x as a parameter. The second and third of these relations may be 


obtained by differentiating (26) with respect to x and (27) with respect to a 
respectively. The third one may also be verified by an integration by parts, thus 


[ : Se ee [reste = 0] 7 2 / : f'()(e — a) dex 


= —f'(a) 
from (27). A further property is 
if Pai =e Gah: (31) 


which may be obtained by differentiation of (28) with respect to a. It may also be 
obtained from (27) if one puts 6 for a and then takes 0/(a—«) for f(x), and is then 
an example of how the 6’ function may be used as though it were a continuous 
function. 
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If for f(a) in (30) we put ~,, p being the variable instead of x, and if we put 
q for a, we get 


i, wd (p — q) dp = —Oy,/O¢. 


This shows that 0w,/Oq may be expressed in the form of the right-hand side of (24) 
with —0d'(p — q) for a,. By making use of higher derivatives of the 6 function, one 
can express 07~),/0q?, 0°w,/0q°, &c., also in this form. 


23.Case of a Continuous Range of Fundamental States 


We can now generalize the theory of the representation of states and observables 
to apply to systems for which the number of independent states is equal to 
the number of points on a line. The ~ on the left-hand side of (24) will be 
represented by the numbers a,, that occur as coefficients on the right-hand side, 
or by the function a, of the continuous variable p. Also if a is any observable, 
corresponding to (2) we can expand ay, in the form 


ag = J» ADP Ong; (32) 


where the a,, are numbers, and these numbers, which form a function of the two 
continuous variables p and q, will then represent the observable a. It is sometimes 
convenient to call this function of two variables a matrix, in order that one may use 
the same words in talking about the case (32) as about the case (2). The number 
of rows and columns of such a matrix is equal to the number of points on a line. 
Corresponding to the multiplication law (7), we now have 


(8) pq = [om ar Brg, (33) 


which may be proved in an analogous way. Similarly, corresponding to (10), 
we now have that the function b, of p representing ay is given in terms of dp, 
that representing w, by the relation 


b= [ow dp Gp. (34) 


If we regard the number c as an observable, its representative c,,, will, 
by definition, be given by 


CWq = |» dp Cprq> (35) 
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so that Cyq = €6(p — Q). (36) 


The matrix representing unity is now that whose general element is 6(p — q) and 
it still, of course, has the property of leaving unchanged any matrix when multiplied 
into it on either the right- or left-hand side. If we compare these results with 
the corresponding ones for the case of discrete fundamental w’s, we see that the only 
difference is that the two-suffix 6-symbol, defined by (9), is replaced by the 6 
function of the difference of the two suffixes. It is a general rule that the two-suffix 
d-symbol is always to be replaced by the 6 function in this way when one passes 
from the case of sums to the case of integrals. 
The connexion between the fundamental w’s and the fundamental @’s of 
the same representation now 
bra = O(p — 4), (37) 


which is obtained from (19) by replacing the two-suffix d-symbol according to 
the rule. This condition (37) implies that @p7, is infinite. Thus the law of §8 that 
any @-symbol can be multiplied into any w~-symbol, giving a number as product, 
must be relaxed to allow the possibility of the product being infinite. 

When each fundamental @ and fundamental ~ with the same suffix are 
conjugate imaginary symbols denoting the same state, we have, as before, 
an orthogonal representation. We shall now consider the meaning of equation (37) 
for an orthogonal representation. This condition (37) may be split up into the 


two conditions 
PpWq = 0 (p F# q), (38) 


[ets dp =1. (39) 


The first of these, corresponding to (17), again expresses that any two fundamental 
states are orthogonal. The second, corresponding to (18), is sometimes taken as 
the definition of the normalization of =, when the suffix g labelling the independent 
states =, takes on a continuous range of values, instead of the condition @W,_ = 1, 
which would now be mathematically useless, as it would require the @’s and w’s 
in (37) to be all multiplied by infinitely small coefficients. If, however, one changes 
the definition of normalization in this way, one must remember that the laws 
for the physical interpretation of the theory hold only for the old definition. 
The general law given at the end of §11, that @,ayw, is the average value of 
the observable a for the state ~, provided @yW_ = 1, is of universal applicability, 
for the continuous as well as for the discrete case. It is true that for the continuous 
case $,aw, will in general be zero when ¢7w, = 1, but, as the applications of 
the theory will show, this is what the physics then requires. Only the ratios of 
the averages of different observables are then of interest, and for the calculation of 
these the normalizing condition (39) is useful. 
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With the help of (37) we obtain from (24) 
Pq) = Pq / ApWp dp = [es8t0 73 q) dp = aq, (40) 


by making an application of (27). This result, corresponding to (20), 
gives explicitly the coefficients on the right-hand side of (24) representing w. 
The conjugate imaginary ¢ is represented by the numbers aj = ow,, corresponding 
to (21), which are the conjugate complex numbers to a, in the case of an orthogonal 
representation. Again, from (32) we obtain 


Prapg = [ews AD Og = [oe — Pp) Ep Apq = Org, (41) 


which, corresponding to (22), gives explicitly the elements of the matrix 
representing a. We no longer, however, have the result that a diagonal element 
Qqq is for an orthogonal representation the average value of a for the state w,, 
since the normalizing condition (37) which is here used is not the correct one for 
physical interpretation. This result would give, if, for example, we took a equal to 
unity, the value 6(q — q) = oo, whereas the average value of unity must of course 
be unity. 


24. The Weight Function 


It is sometimes convenient to modify equations (24) & (32), which define 
the representatives of a state and observable, by the introduction of a weight 
function. We can take any function p, of the variable p which is defined throughout 
the range of p used for labelling the fundamental states and which has no zero 


values, and put instead (24) & (32) 
w= f aypp dp, (42) 


Ag = [eon dD Upg. (43) 


We can now consider the new coefficients a, and a,, to be the representatives 
of the state and observable. This does not give any essential generalization of 
the theory of representation, since the new representatives are connected with 
the original ones by very simple relations. It is merely a device which is convenient 
in certain applications of the theory, usually for increasing the symmetry 
of the equations, or for making more direct the physical interpretation of 
the representatives which will be given in §28. We could, of course, adopt the same 
device in the case of a discrete set of fundamental states, but there do not seem 
to be any examples for which it is then of any value. 
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When the weight function p is introduced it must appear, not only in 
the expansions (42) & (43), but in all formulas which involve an integral over 
the parameter p that labels the fundamental w’s, e.g. in the multiplication law for 
two observables, equation (33), which becomes 


(A.B) pq = [emer dr Brp, 


and in that for an observable with a ~, equation (34), which becomes 


bg = [ewer dp Ap. 


Again, the number c, regarded as an observable, is no longer represented by 
the right-hand side of (36), since instead of (35) we now have 


CHq = [even dp Coq; 
which gives Coq = cp, 0(p —qQ= cp, '0(p =i) 


The unit matrix is thus changed from 6(p— q) to p,'d(p — q). This suggests that 
equation (37) should be changed to 


dry = pp 5(p — 9), (44) 


a conclusion which is confirmed when one notes that the normalizing equation (39) 
must be changed to 


[ember dp =1. (45) 


We can now see what changes must be made in the representatives of states 
and observables when the weight function is introduced. If we multiply the @, and 
Wp of equations (37) and (38) by p,%, they will then satisfy equations (44) and (45). 
We must then multiply the a, of equation (24) by p? in order that it may satisfy 
(42) and the a,, of equation (32) by (p,p,)~? in order that it may satisfy (43). 
These results are particular cases of the general rule that any symbol involving 
the suffixes p, q,... gets multiplied by (p,p,...)~? when the weight function is 
introduced. From this rule one can see the necessity for the insertion of the factor 
Pp in every integral with respect to the variable p, when one bears in mind that 
the integrand must contain the suffix p twice. 


25. General Case of Representation 


In most of the applications of quantum mechanics the atomic system dealt with 
has a still larger number of independent states than we have hitherto considered. 
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The fundamental states of a representation can then be labelled conveniently only 
by means of several suffixes pj, po,..., Pn, Which can take on any values within 
a given domain of the n-dimensional p-space. The generalizations which must now 
be made in the preceding theory are quite obvious. We must have, for instance, 
instead of (24) and (32), the expansions 


w => a be aed Cys Wiss, dp, dp Bea (46) 
Wag... = / --+Wprpo... dpi dpe --- Apypo...qrq.. (47) 


A state ~ is now represented by Gp,p,... a function of the n-variables pi, po,..., 
and an observable @ by Qp,p9...q,qo... ‘matrix’ whose rows and columns are both 
labelled by these same variables. The ~-symbol w,,4,..., one of the fundamental 
states, is represented by 


O(p1 — q1)6(p2 — G2)... 0(Pn — dn) (48) 


as may easily be verified by substituting this expression for ap,p... in (46) and 
carrying out the integrations one by one with the help of (27). It is always this 
product (48) that replaces the 6(p — q) of the one-dimensional case. In the same 
way the ~-symbol 


O 
dq, ne (m = 1, 2, Phu ,n) 


is represented by 


—6(pi1—q1) 6 (p2—a) - - 0 Pia —Gai)o Pin—Ga)o (Psi —Greet) .--O(Da—Qn), (49) 


as may easily be verified with the help of (30). This expression differs from (48), 
apart from the minus sign, only in the m-th factor. 

We must make a still further generalization in order to include all the cases 
of representation that occur in practice, namely, we must allow both sums and 
integrals to occur together. In the one-dimensional case, for instance, we can have 


v= S° ap wp + [evn dp (50) 
P 


The discrete set of numbers ap together with the continuous set a, now 
represent the state w. They may be considered as a function of a variable 
whose domain consists of a continuous range together with some discrete points. 
In the many-dimensional case we can have sums for some of the variables and 
integrals for others. The general rule applying to every case of representation is 
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that a state is represented by a function whose domain is such that every point 
of it corresponds to one of the fundamental states. There is no restriction on 
the number of points in this domain or on their arrangement in the p-space 
that labels them. Thus the domain may consist of discrete points together 
with a number of continuous regions each having any number of dimensions. 
An observable is represented by a matrix whose rows and whose columns are 
in one-one correspondence with the points of this domain. 

The equations of our previous theory of representation can all be taken over 
without difficulty, but cannot very well be written down in a form that includes 
all cases without an elaborate notation. We shall therefore take simply the case 
when (50) holds as an illustration. Corresponding to (2) and (32), we now have, 
for the definition of the representative of an observable, 


ave = >— vpapa + J» dp pq, 
P (51) 

ag = S- bpar, ae | AD Apg. 
P 


There are thus four kinds of coefficients in the representative of an observable, 
typified by apg, Apa, Ap & Apq corresponding to the different cases of discrete or 
continuous values for the suffixes. Again, corresponding to (7) and (33), we now 
have for the multiplication law for the representatives of observables, 


(a8) PQ = S— aprbro nie [or dr Bra, 
R 


(23)pq = >_ OprBre + ip Apr Ar Bra, 
R 





(a8) pq — S- apRE R¢q “Th [ow dr Bee 
R 

(A8)pq = ». ARO Rq + om dT Byq. 
R 


In each case there is a sum over R and an integral over r. The conditions (19) and 
(37) become 


bev = opa; dpa = 9, 
Op Wq = 0, PpWq = O(p _ q)- 


These examples are sufficient to show how each equation is to be interpreted in 
any of the various kinds of representation that may arise. 
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We can make a final generalization by introducing a weight function in 
the general case. This weight function p may be an arbitrary function of the 
variables p,, that label the fundamental w’s, provided it never vanishes. It will 
always appear along with the differentials dp,, in any integration and will also 
appear, to the power of —1, in the unit matrix. 


V. TRANSFORMATION THEORY 


26. Eigenstates as Fundamental States of 
a Representation 


IN the preceding chapter the idea of a representation of the abstract symbols 
was introduced and was treated entirely from a general mathematical point 
of view, the representatives being like co-ordinates of the symbols referred to 
a general co-ordinate system. We must now consider particular representations, 
i.e. co-ordinates referred to particular co-ordinate systems, which must be 
singled out and specified in a certain way. We shall find, incidentally, that 
our representatives now often have direct physical interpretations. We shall 
be concerned here and throughout the future work only with orthogonal 
representations. 

An orthogonal representation is built up on the basis of a complete set of 
orthogonal states, forming the fundamental states. Such a set of states is obtained 
most easily with the help of the theory of eigenvalues of Chapter III. If we take 
a set of real observables that all commute, their simultaneous eigenstates form 
a complete set and any two belonging to two different sets of eigenvalues are 
orthogonal. If the set of commuting observables is a complete one, then, as 
shown in 817, there is only one eigenstate for each set of eigenvalues, so that 
the eigenstates must now all be orthogonal. These eigenstates can therefore be 
taken to be the fundamental states of a representation. Each of them is associated 
with one set of eigenvalues, which may conveniently be used for labelling it, instead 
of the arbitrary suffixes p,, of the preceding chapter, which have no physical 
meaning. Thus if the commuting observables are €,, £:,..., €, and if we denote 
the eigenvalues of €,, by €/,, €,,... a fundamental w may be written w(& 9 ...€,), 
or simply ~(€’) for brevity. In the same way a fundamental ¢ may be written (&”). 
The fundamental ¢ that is conjugate imaginary to w(€’) will be ¢(€’). 

The notation of primes and multiple primes to denote the eigenvalues of 
an observable is very convenient and will be used generally in the future. A new 
notation for the representatives of states and observables will now be introduced, 
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which will greatly increase the symmetry in our equations. A general ~-symbol 
w is represented by a set of numbers, each of which is associated with one of 
the fundamental w’s and thus with one set of eigenvalues. That particular one of 
the set of numbers which is associated with the eigenvalues £), €5,..., €, will be 
written (€) 65... € |), or (€|) for brevity. When it is necessary to particularize 
the w-symbol by a suffix, k say, we can insert this suffix in the representative 
of ~, to the right of the vertical line, thus (€| & ... €|k) or (€'|k). The reason 
for this notation is that, as we shall see later, there is a remarkable symmetry in 
the way (€'|k) involves the set of numbers €’, referring to one of the fundamental 
w’s, on the one hand, and the parameter k which specifies the w that is being 
represented, on the other. This symmetry is exactly expressed when one puts 
the €’s and the k to the left and right respectively. In a corresponding way 
we shall write the representative of a general ¢-symbol as (|€’) and of a particular 
one, @,, as (k|é’). For the representative of an observable a, we shall write 
the matrix element apg, associated with the fundamental states ~, amd Vy, 
as (£65... Elalé/ & ... €), or as (é'|a|é”) for brevity, where the €’’s and €’’s 
are the eigenvalues belonging to the fundamental states w, and w, respectively, or 
wW(é') and W(€”), as they would be written in the new notation. 

Some of the equations of the preceding chapter will now be written in the new 
notation to illustrate how it runs. Equations (3) and (4) become 


(fla + BIE") = (F'lalé") + (E1B1E"), 
(f"lealé") = c(f'la]&"). 


Equation (1) or (24), defining the representative of a w-symbol, becomes, if we take 
for definiteness the case when each of the €/’s has a continuous range of values, 


ie if w(E’) dé! (é'), (1) 


where d€’ is short for the product d&,d&,...d&/, and only one integral sign is 
written to denote integration over all these variables. It should be noted how, 
when one puts the dé’ in the proper place, all the és in (1) occur together. This 
is the new form of the suffix rule given near the end of §20. Equation (2) or (32) 
of the preceding chapter, defining the representatives of an observable a, becomes 
in the same way 


a(t") = / w(é!) dé! (Elale”). (2) 


Again, the multiplication law for the representatives of two observables, 
equation (7) or (33), becomes 


(é'loé") = / (é"lale") de” ("|B |e") 
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and that for the representatives of an observable and a ~, equation (10) or (34), 
becomes 


(0) = f lala”) a6" (€"1H), (3) 


where & specifies the w-symbol wv, and | specifies vy = aw,. The conjugate complex 
a@ of an observable a is now represented by 


(Slale") = (é"lalé’) (4) 


corresponding to (23), and the representatives (€'|) and (|&’) of a w and its 
conjugate imaginary @ are conjugate complex quantities. 

The representation we are now considering is built up from a number of 
commuting observables €,, €2, ..., €,, whose simultaneous eigen-w’s are taken as 
fundamental w’s. Let us determine how one of these observables, €,, say, is itself 
represented. Putting wv,, for a in (2), we get 


Emile") = f we) a&! Egle") (5) 


But since 7(€") is an eigen-w of €,, belonging to the eigenvalue €” we have 


gible) = ene") = f le) deg, - €") (6) 


where 6(€’ —€”) is short for the product 6(&; —€1)d(€4—&7) ...6(€, —€). Equating 
coefficients on the right-hand sides of (5) and (6), we obtain 


(Elemlo") = bn (E" — &"). (7) 


This, of course, is equal to €” 6(é’ — €”) and is therefore symmetrical between the 
singly and doubly primed symbols. 

If the €’’s take on discrete sets of values instead of continuous ranges, we should 
obtain instead of (7) 

(E1Emlo") = Sn dere", 

where dg” is short for the product dg edge... Og,e. Thus the observable §, 1s 
represented by a diagonal matrix, whose diagonal elements are its eigenvalues €',. 
A diagonal matrix, in the case of continuous ranges of rows and columns, 
may conveniently be defined as one whose general element (€’,&”) involves the 6 
function 6(€’ — €”) as a factor, like the right-hand side of (7), and the coefficient 
of the 6 function may be defined as the general diagonal element. With these 
definitions the above law in italics for the representative of €,, holds in all cases. 
The appropriateness of this definition for a diagonal matrix in the continuous 
case rests on the fact that, as is easily verified, it makes two diagonal matrices 
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always commute, which is one of the most important properties of diagonal 
matrices in the discrete case. For this reason it would not be sufficient to define 
a diagonal matrix in the continuous case merely as one whose general element 
(€’,€”) vanishes except when the €”’s differ infinitely little from the €’’s. 

If f(€) is any function of the €’s, then its representative is found to be, 
by a similar argument to that leading to (7), 


ENE P= Fe oe =e") (8) 


The coefficient f(€’) must, of course, have a meaning since the function f must 
be defined for each of the eigenvalues of the €’s. Thus the representation 
based on the simultaneous eigen-w’s of a set of observables as fundamental w’s 
is such that the representative of each of the €’s and of any function of them 
is a diagonal matrix. Conversely, every diagonal matrix in this representation 
represents a function of the €’s, this function being specified by the general diagonal 
element (€’,€”) regarded as a function of the variables €’. 

Thus if we take any set of observables that commute, there will exist 
a representation in which each of these observables simultaneously is represented 
by a diagonal mairiz. If the set of observables is a complete one, then 
the representation will be completely determined by these observables, except 
for arbitrary phases which arise from the fact that a simultaneous eigen-w of 
these observables may be multiplied by any numerical factor of modulus unity 
without any of the conditions defining it being invalidated. For example, we can 
multiply each ~(é') by exp|—if(€’)], where f(€’) is an arbitrary real function of 
the €’s. This will require every representative of a state, (£’|), to be multiplied by 
exp —if(&’) and every representative of an observable, (&’|a|€”), to be multiplied 
by expi|f(é’) — f(é’)]. A diagonal element (€'la|é’) remains unaltered by this 
transformation, as is necessary on account of its having the physical meaning of 
an average. The arbitrary phases which thus arise in the representatives are usually 
unimportant and trivial, so that we may count a representation as being completely 
determined by the observables that are diagonal in it. This fact is already implied 
in our notation, since the only indication in a representative of the representation 
to which it belongs are the letters denoting the observables that are diagonal. 

The representations considered in this section, in which each fundamental 
w is a simultaneous eigen-w of a set of real commuting observables, are not of 
a special kind, since every orthogonal representation has this property. In fact, 
if we take any representation, having wp, W,,... as fundamental w’s, we can then 
form any diagonal matrix whose general element &,, is of the form a,d(p — q), 
where a, is a real function of p, and consider this diagonal matrix as representing 
an observable €. This observable will be real if the representation is orthogonal. 
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We shall then have 
Ey = f Up df= f Up dpa,5(p— 4) = aut 


so that each fundamental w, wz, is an eigen-w of €. In the many-dimensional 
case, when several suffixes p or g are required to label a fundamental w, we can 
take several diagonal matrices and each will represent an observable € for which 
the fundamental w’s are all eigen-w’s. We can obtain in this way a sufficient number 
of observables € having the fundamental w’s as eigen-w’s to form a complete set. 
The notation and methods of the present section can then be applied. 


27. Canonical Transformations 


If we take two representations, based respectively on the fundamental w’s v(&’), 
which are the simultaneous eigen-w’s of a set of commuting observables €,, 
and the fundamental w’s w(n’), which are the simultaneous eigen-w’s of a set of 
commuting observables 7,,, then an arbitrary w will have the two representatives 
(€'|) and (7'|), which are functions of the sets of variables €’, and 7, respectively. 
Since a w is completely determined by its representative in any one representation, 
there must be a connexion between the two representatives (€’|) and (7’|) such 
that either is determined by the other. We shall now investigate the form of 
this connexion. 

From the definition of the representative (7'|) we have, if we take for definiteness 
the case of integrals, 


w= fo) ay). (9) 


Now each fundamental ~ of the 7-representation, w(n’), will itself have 
a representative in the €-representation. We may write this representative (&'|7’), 
with 7 on the right to show which w it represents. We shall then have 


wa) = f oe) de’ En) (10) 


for the definition of (€'|n’). Substituting this value for w(7’) in the right-hand side 
of (9), we get 


w= ff ve) ae’ en!) a (op, 


which gives, on comparison with equation (1) which defines (£'|), 


ED) = fem ay (a). (11) 
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This is the transformation equation which gives the €-representative of a q-symbol 
in terms of its 7-representative. The corresponding equation which gives (7’|) in 
terms of (€’|) may be shown in the same way to be 


(ol) = f erie’ ae, (12) 


where (7’'|€’) is the representative of the fundamental w, w(€’), in 
the 7-representation. 

The two representatives (€’|) and (7’|) are thus linear functions of one another. 
The expressions (&"|7’) and (7'|€’) which enable us to pass from one to the other 
will be called transformation functions. They are each functions of the two sets of 
variables €’ and 7’. We can obtain an explicit expression for (€'|n’) by multiplying 
equation (10) by ¢(€”) to the left, a process* corresponding to that used for getting 
equation (40) of the preceding chapter. The result is 


(f'In') = ob). (13) 


Similarly it may be shown that 
(n'l€') = by’) W(€). (14) 


Hence (&'|7’) and (7'|€’) are conjugate complex quantities. 

The transformation functions must satisfy certain conditions in order that (11) 
and (12) may be consistent. If we substitute for (7'|) in (11) its value given by 
(12), we get 


1) = ff ent) an’ cole") ae"). 
But we have also 


) = f ae -€") a8" e"D. 


Since these equations must hold for an arbitrary function (€"|) of the variables €”, 
we can equate the coefficients of (€”|) on their right-hand sides. This gives 


[i an! (ate) = 5 = (15) 


An alternative way of obtaining this result is to apply equation (11) to 
the w-symbol w(€’). Since the 7-representative of this ~-symbol is (7'|é”), 
the right-hand side of (11) becomes | (&'|7’) dy’ (7'|€”), while the left-hand side 





*that different eigenstates are orthogonal 
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becomes the €-representative of w(€"), which is, of course, 6(€’ —€”). The equation 
corresponding to (15) in which € and 7 have changed places, namely, 


/ (n/le’) de’ (ln!) = 6! — 0), (16) 


may be similarly obtained. Equations (15) and (16) are the only conditions which 
the transformation functions must satisfy identically. They are of the nature of 
orthogonality and normalization conditions. 

The transformation of the representatives of ¢-symbols may be treated in 
the same way. We should then find, for instance, the equation 


(|n') = i: (le) de’ (ef) 


as the transformation equation which gives the representative (|7’) of an arbitrary 
@-symbol in terms of its representative (|€’), where the quantity (€’|n’) is now 
defined as the 7-representative of the fundamental ¢, (&’), i.e. by the equation 


o(é) = / (é'lnf) arf (7). 


If we multiply this equation by w(7") on the right, we obtain, as an explicit 
expression for this (£'|7’), 
HE )v(n") = (E'ln"), 

which is the same as (13). Thus this quantity (¢’|n’), defined as the 7-representative 
of (&'), is the same as our previous one defined as the &-representative of 
v(7’), so that our notation of using the same symbol for them both is justified. 
The symmetry which thus exists in the way the quantity (€’|7) involves the €’’s and 
7’’s is the same as that which was referred to in the preceding section when the new 
notation for the representative of a state was introduced, since any representative 
(€'|k) of a specified w-symbol wv, when suitably normalized, may be regarded as 
the transformation function connecting the €-representation with a representation 
in which wy, is one of the fundamental states. 

Owing to the arbitrary phases occurring in representations, there will 
be a corresponding amount of arbitrariness in the transformation functions. 
If the fundamental states ~(£'), w(7’) are multiplied by exp|—if (€')], exp|—ig(7’)| 
respectively, f and g being arbitrary real functions, the transformation function 
(€|n’) will get multiplied by exp {—i|f(€') — g(7’)|}. Thus the modulus of 
the transformation function is quite definite, the indeterminacy being only in 
its phase. 

The connexion between the representatives of an observable a in the two 
representations may be easily obtained in a variety of different ways. We can, 
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for instance, use the explicit expression for the representative of a given by 
equation (41) of the preceding chapter. Applying this to the €-representation, 
we get 

(é‘lalé”) = ¥(Eav(é") 
If we now substitute for the right-hand side, which consists of the product of three 
abstract symbols, their representatives in the 7-representation, we get 


(é'lalé”) = / / (é'tn) dr! (nll) dy” (ne), (17) 


which gives the €-representative in terms of the 7-representative. Similarly we may 
obtain the result 


(n!laln") = / | (a! le’) de’ (€/lale”) ae" (€"ln"), (18) 


giving the #-representative in terms of the €-representative. | These are 
the transformation equations for the representatives of an _ observable. 
Either representative is a linear function of the other, and the same transformation 
functions are required for passing from one to the other as for the representatives 
of states. 

If we now take a third representation, ¢ say, we shall have transformation 
functions (¢’|€’),(€"|¢’), connecting it with the -representation, and transformation 
functions (¢’|7’), (7’|¢’), connecting it with the 7-representation. There are simple 
relations between the transformation functions. Equation (13), with ¢ instead of 
7, gives us 

(é'|¢’) = o(€)¥(C) 
If we substitute for the right-hand side, which consists of the product of two 
abstract symbols, their representatives in the 7-representation, we get 


(é'IC’) = / (él) arf (WIC) (19) 


The conjugate complex equation, which could be deduced independently in 
the same way, is 


(Cle) = / (Cn) dnt! (n!le’). (20) 


Equations (19) and (20) give the €, ¢ transformation functions in terms of the €, 
7 and 7, € ones. 

If we multiply equation (17) by dé&’(&"|n'"), putting the new factor on 
the right-hand side of each term in order to maintain the ‘fluency’ of the notation, 
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and integrate with respect to €”, we obtain 
[eatery aer ert) = f [fe ni) ant Greta") ant (alte) a5" (6 
= ff ent) an! fal") an” 5(y" =" 


with the help of (16). Hence 


/ (é"lalé”) dé" (é"Inl") = i (é'n") drt (a! lal”). (21) 


We shall call either side of this equation (€’|a|7”) and consider as_ it 
the representative of the observable a in a mixed representation (&,7). It is, in fact, 
a matrix sufficient to determine the observable a and differs from the representative 
matrices we have previously considered only in that its rows and its columns refer 
to two different sets of fundamental states and are therefore no longer in one-one 
correspondence with each other. The representative matrices of two observables 
in mixed representations can be added provided they are both in the same mixed 
representation, i.e. we have 


(fla + Bln’) = (E'leln’) + (618 ln’). 


Also they can be multiplied if they are in two different mixed representations 
such that the columns (specified by the letter on the right-hand side) of the first 
factor refer to the same set of fundamental states as, and are thus in one-one 
correspondence with, the rows of the second, 7.e. we can multiply (€'|a|n’) into 
(7'|G\C’) to give a product 


(é'la|¢’) = / (é'louln!) df (7 |61C’). 


It should be noticed that the representative of unity in the mixed 
(€, 7) representation, i.e. (€’|1|7’), is just the transformation function (£'|7’) itself, 
as follows at once from the definition (21). The terms ‘diagonal matrix’ and 
‘diagonal element’ of course have no meaning when applied to representative 
matrices in mixed representations. Again, the representatives of the €’s and 
n’s themselves in the mixed (€, 7) representation are given by the following 
expressions, as is easily verified by using the left- and right-hand sides of (21) 


respectively: 
(€'lémln’) a4 


(€'lmln’) = (E'ln') nin (22) 
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These representatives are thus expressible directly in terms of the transformation 
function. 

The equations of this section have all been written down for the case when 
the parameters ¢’, 7’,..., labelling fundamental states take on continuous ranges 
of values. The necessary modifications to be made when some or all of them take 
on discrete sets, or both discrete sets and continuous ranges, of values are obvious. 
If in one representation € the €’’s take on, say, continuous ranges of values, then it is 
not necessary that in another representation 7, applying to the same dynamical 
system, the 7/’s should also take on continuous ranges of values, although if in one 
representation the number of fundamental states is finite, then it must be the same 
in any other representation. 

The transformations here discussed from one representation to another may 
be called canonical transformations. One must take care not to confuse 
them with contact transformations, defined in §19, as was frequently done in 
the earlier literature on quantum mechanics. The two kinds of transformation 
are mathematically of the same form, as one sees if one writes the canonical 
transformation equations (17) and (18) symbolically with S and S$! for 
the transformation functions (&'|n’) and (7'|é’), but they have quite different 
meanings. The canonical transformation is a transformation from one 
representation of observables to another representation of the same observables, 
while the contact transformation is a transformation from one set of observables 
to another different set of observables. For the contact transformation the new 
observables are connected with one another by the same algebraic and functional 
relationships as the original ones, while the corresponding results for the canonical 
transformation merely express the condition that the new representatives are 
entitled to be called representatives of the same observables. The contact 
transformation has its analogue in classical mechanics, as has been already 
mentioned, but the canonical transformation, which is the more important one 
in quantum mechanics, has, of course, no such analogue, since in the classical 
theory we do not deal with representations. 


28. Probability Amplitudes 


Suppose observations to be made of each of a set of commuting observables €,, 
when the system is in a given state w. The probability of any given set of 
results being obtained is equal to, according to 818, the square of the modulus 
of the corresponding coefficient in the expansion of w (which is assumed to be 
normalized) in terms of normalized simultaneous eigen-w’s of the observables &,. 
If the observables €,, form a complete set, there will be only one simultaneous 
eigen-w for each set of eigenvalues €/, and the coefficients in the expansion of 
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will form a representative of ~, denoted by (£’|). The probability of the set of 
results €’, being obtained now becomes |(é’|)|°. There is thus a physical meaning 
for the €-representative of any normalized w, or at least for the modulus of this 
representative, in terms of the probability for a given result being obtained for 
a maximum observation consisting in measuring the complete set of observables €,,. 
The same physical meaning can, of course, be given to the representative of 
any normalized ¢, which is just the conjugate complex of that of the conjugate 
imaginary w. 

Take now the case when w is one of the fundamental w’s, w(n’), of another 
representation 7. The probability of the results €’ being obtained is now given 
by |(é/ln')[?, i.e. by the square of the modulus of the transformation function. 
But the state w(7’) is the one for which the observables 7 certainly have 
the values 1. Thus |(é’\7’)|” gives the probability of the observables € having 
the values €’ when the 7’s are known to have the values 7. For this reason 
the expression (€’|7’) is called by Pascual Jordan a probability amplitude. There is, 
as we saw in the preceding section, an uncertainty in its phase, but its modulus is 
quite definite. The square of its modulus is an ordinary probability. Since 


en) P = (En @'€) = 1)? 


we have the reciprocal theorem, that the probability of the &’s having the values 
&' when the 7's are given to have the values 7! is equal to the probability of the n’s 
having the values 7! when the &’s are given to have the values €'. 

When the €’’s take on continuous ranges of values, then, as mentioned in §23, 
the fundamental w’s of a representation must be multiplied by an infinitely small 
numerical coefficient in order that they may be properly normalized for the purpose 
of physical interpretations. Further, the theorem of 818 that we have just used, 
giving probabilities in terms of the coefficients of an expansion, is no longer true 
when the expansion consists of an integral. For these reasons the expression 
we have obtained for the probability of the €’s having particular values for a given 
state does not hold in the continuous case. But in the continuous case in practice 
we need to know only the probability of the &’s having values lying within specified 
ranges. The probability of their having particular values is zero, as could be 
deduced formally from the theory. The connexion between the probability for 
the state w of the €’s having values lying within small specified ranges and 
the representative of w, when the fundamental w’s are normalized in accordance 
with equation (37) or (39) of §23, will now be obtained. The method used will be 
to obtain the case of continuous €’’s as a limiting form of the case of discrete €'’s 
when there are very many of them lying very close together. 

Take for definiteness the case when there is only one € and suppose that 
it has a very large number of discrete eigenvalues €’ lying very close together. 
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Let the number of eigenvalues per unit range of €’ be s, which can vary with ¢' 
in an arbitrary way. Suppose now that an arbitrary normalized w is expanded in 
terms the eigen-w’s, We, which are correctly normalized for the purpose of physical 


interpretations, 7.e. 
debe =1, (23) 


so that we have v= Ss" Cer Wer. (24) 
é/ 


Then Ice |? is the probability of € having the value €’ for this state ~. We may 
assume that ce varies only slowly from one value of €’ to the next, so that the total 
probability of € having a value lying within the range €' to €’ + dé’, which is small 
but still large compared with the interval between consecutive eigenvalues €’, will 
be approximately 
P =|ce|? s'dé, 

where s’ is the value of s when €’ is the value of its variable. With the same kind 
of approximation we can replace the sum in (24) by an integral, which gives us 


w = [ coves! dé’. (25) 


We must now introduce eigen-w’s, w(€’'), that are normalized according to the rule 
for the continuous case, i.e. 


if ole we") de” = 1. (26) 


The change in the representatives caused by this change in the normalization of 
the fundamental w’s will be of the same nature as that studied in §24 caused 
by a change in the weight function, except that in the present case in the limit 
the change is infinite. 

To compare (26) with (23), we deduce from (23) the equation 


Yo deer = 1, 
om 


which, written with an integral instead of a sum, gives 
| eevers" dé” => 1. 


Since the integrand here vanishes except when é” = €’, we can replace s” by (8's)? 
Thus we can take 


HE) =sBbe, v6") = 8 Buen, 
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and equation (26) will be satisfied. We now get from (25) 
v= feev(es'tae = f ve) ae ep, 


where (&’|), the representative of w according to the rule for the continuous case, 
has the value 
(E'1) = cers’t 


The probability P now becomes |(€’|)|” dé’. Thus the square of the modulus of the 
representative gives the probability, per unit range of €, of € having a given value. 
In the case when there are several observables €, it may be shown in the same way 
that the probability of each €,, having a value between €/, and ¢, + d&/, is 


P=(|(€))/ déjd&... dé, = |(E'|) |? det (27) 


Suppose now, in this case of continuous €’’s, that we take for w~ one of 
the fundamental w’s, w(7’), of the new representation 7 and suppose the 77’’s to take 
on discrete sets of values. The normalizing conditions (15) and (16) now become 


Seale”) = 6 - €), (28) 
i, (alle) de! (Ele) = Sara (29) 


These are just the correct normalizing conditions for us to be able to apply 
the result (27). This is because the first of them gives 


H(EVU(E") = O(E — &"), (30) 


[since equation (28) is just equation (30) written in terms of 7-representatives 
instead of abstract symbols] showing that the fundamental w’s of 
the €-representation are normalized in accordance with (26); while the second 


of them gives ; , 
O(N)O(N") = Onin (31) 


[since equation (29) is just equation (31) written in terms of ¢-representatives 
instead of abstract symbols,| showing that (n')w(n’)=1 or that (mn) is 
correctly normalized for the purpose of physical interpretations. Hence we have 


the result that Pe eee 
el ag (32) 
is the probability of the €’s having values between €’ and €’ + dé’ when the 7’s are 


given to have the values 7’. The transformation function is still a sort of probability 
amplitude. From (29) we obtain 
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i (en)? ae’ =1, 


which shows that the total probability of €’ having any value is unity, giving a check 
on the normalizing conditions. 

When both the 7/’s and the €”s take on continuous ranges of values, 
the transformation function can no longer be used to give actual probabilities 
in any convenient way. It will still, however, give relative probabilities. Even when 
(€'|7’) is not normalized with respect to 7’ correctly for physical interpretations, 
the expression (32) will still give the probability of the €’s having values between 
é’ and €’ + dé’, apart from a factor independent of €’.. It will be found in 
the applications that such relative probabilities are all that is then required. 

The two main types of problem in quantum mechanics are to determine 
the possible results of an experiment and to determine the probability of occurrence 
of one of these possible results under given initial conditions. The first type consists 
in calculating the eigenvalues of an observable, while the second always reduces 
to calculating a probability amplitude or transformation function and taking 
the square of its modulus. A general method for calculating the transformation 
function connecting a set of €’s with a set of 7’s, when algebraic relations between 
the €’s and 7’s are given, is as follows. First obtain the matrices (€'|nNn|&") 
representing the 7’s in the €-representation, the only conditions that these matrices 
need satisfy being the given algebraic relations. One can now use the equations 


/ (é'lmlé”) d” (€"lnf) = (El nly 


which follow at once from (21) and (22). These are linear integral equations 
in the variables ¢’ for the unknowns (€'|n’). They are, in fact, the standard 
equations of the theory of eigenvalues and the solutions, when normalized, are 
just the transformation functions. These solutions are often called eigenfunctions 
of the matrix (€'|nm|€”), which determines them. An application of this method 
will be made in §35 to a case in which the integral equations reduce to differential 
equations on account of (€'|7,,|€”) involving the 6 function and its derivatives. 


29. Example 


We have seen in §26 that if we have any set of observables €,, that commute 
with one another, then there exists a representation, called the €-representation, 
in which each of them is represented by a diagonal matrix, whose diagonal elements 
are then its eigenvalues. This fact is of very great value in applications of the theory 
and usually forms the starting-point in any calculation of representatives. 
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To illustrate how it is used, two simple examples will now be given, which will 
later be found to be of physical importance. 
The first will concern the observables p and q satisfying 


gp — pq = 1, 


which were introduced in §12. Our problem will be to find the eigenvalues of 
p* +q? We shall assume that p and gq are both real observables. We can then infer 
by an elementary argument that p? + q? cannot have any negative eigenvalues. 
We see that the eigenvalues of p? cannot be negative, since they are the squares of 
the eigenvalues of p, which are all real. It follows that the average value of p? for 
any state ~ cannot be negative. Similarly the average of q? for this state w cannot 
be negative. Hence the average of p? + q? for the state 7 also cannot be negative. 
Thus p?+q? cannot have a negative eigenvalue, since if it did it would have 
a negative average value, equal to this eigenvalue, for the corresponding eigenstate. 


Let A = (p+ iq)(p — tq) 
=p’ +q° +i(gp — pq) 
=p+q¢-l1. 





We then have : ; 
(p—ig\(pt+ig) =p? +q?+1=A+2, 


and hence 

A(p + iq) = (p + tq) (p — tq) (p + tg) = (p + tg)(A + 2). 
We now rewrite this equation in terms of the representatives of the symbols it 
involves, in a representation in which A is diagonal. This gives 


DAIAIA) (Ap t tal) = SUA t ig A" )(AA + 214"), 


All’ Al’ 
which, since (A'|A|A”) = Alo ara, 
reduces to A'(A'\p + ig|A") = (A'|p + ig|A”)(A” + 2). 
Hence either (A'|lp+ig|/A")=0 or A’=A" +2. 


We have by a direct application of the matrix law of multiplication, where A’ 
is any eigenvalue of A, 


(A'l(p + ig)(p — iq)|A’) = 90 (A'[p + tg A")(A" lp — tg A), (33) 
- 


the summation being extended over all eigenvalues A”. But we have seen that 
(A’|p + ig|A”) vanishes unless A’ = A” + 2. Thus all the terms in the summation 
vanish except the one for which A” = A’ — 2. If, now, A’ — 2 is not an eigenvalue 
of A, then all the terms in the summation will vanish without exception, and we 
shall have 

0= (A'|(p + ig)(p — ig)|4’) = (A'AA’) = A’. 
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We have therefore obtained the result that if A’ is any eigenvalue of A, either A’—2 
is another eigenvalue or A’ = 0. Thus if A’ is any eigenvalue, we shall have the 
series of eigenvalues A’, A’—2, A’—4, A’—6,..., which cannot extend to —oo since, 
as we have already seen, there can be no negative eigenvalues for p? + q?, which 
is equal to A+ 1. This series of eigenvalues must therefore terminate, and can 
terminate only with the value zero. Thus the eigenvalues of A are 0, 2, 4, 6,..., 
and those of p* + q? are 1, 3, 5, 7,.... 

The representatives of p and qg can now easily be obtained. Equation (33) 
reduces to 

Al = (A'|p + ig|A’ — 2)(A’ — 2|p — ig|A’). 


The two factors on the right here are conjugate complex quantities, on account of 
equation (4). Hence 


A! = (A'|p + ig|A’ — 2) = Ae™, 


where 7 is a real function of A’. All the elements not of this type of 
the matrix representing p+ 72q vanish. The conjugate complex observable p — 7q is 
represented by 

(A’ — 2|p — ig|A’) = APe-*, 


with all the matrix elements not of this type vanishing. Hence 


(A'|p|A’ — 2) 


34 
(A’ — 2[p|4’) te 


= 4A, (A'|g|A! — 2) = Hide” 
= Alben, (Al — 2Iq\A’) = BAPE, 
and all the matrix elements representing p and q that are not of these types vanish. 
The occurrence of the arbitrary phase 7’ in these representatives for p and q is 
in accordance with the remark of §26, that a representation is not completely 
determined by the observables that are represented by diagonal matrices. 

The eigenvalues of A form, as we have seen, a discrete set and hence 
in the representation with A diagonal the number of fundamental states is 
enumerable. This is rather remarkable in view of the fact that we can obtain 
another representation in which the number of fundamental states is equal to 
the number of points on a line, for example, the representation in which p is 
diagonal, since, as shown in 819, the eigenvalues of p include all numbers from 
—oo to co. Thus by counting the number of independent states of a system in 
different ways, one may obtain different cardinal numbers as result. 
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30. Second Example 


Our second example will concern three observables a, 3, y that satisfy 


af — Ba = iy, 
ya —ay = if. 
Let of + P+ =8. 


Our problem will be to determine the eigenvalues of a, 3, y and 6. We shall assume 
a, 2 and 7 are real. We can then infer that 6 cannot have any negative eigenvalues, 
by a similar argument to that at the beginning of our previous example. 


We have 3 : 
yar — avy = (ya — aya t+ a(ya — ay) 


= 10a+1aB 
from the third of equations (35). Similarly 
78° — Bey = (¥8 — By)B + B(vB — By) 


= —1a0 — ia. 
Hence y(a? + B?) — (a? + B?)y = 0, 
so that v0 — Oy = 0. 


Thus 6 commutes with y, and therefore from symmetry it commutes also with a 
and 6. Hence it commutes with any function of a, @ and ¥. 

We thus have an observable 6 commuting with all the observables that occur 
in the problem. Whenever we find an observable having this property, we should 
expect to be able to treat it simply as a number in all subsequent investigations, 
as by so doing we do not invalidate any of the algebraic equations that it 
satisfies. A formal proof of the legitimacy of this proceeding is as follows. We use 
a representation in which @ is diagonal, together with certain other observables, 
Kk say, so that any observable P is represented by (6/«'|P|0"«""). From the condition 


dP — P@=0 
we obtain O'(6'K | P\O"K") — (0'K'|P0"K")0" = 0. 
Hence (0K |P\A"K”) =0 
unless 6’ = 0” Thus all the matrix elements representing any observable in 


the problem vanish unless they are of the type (@’s’|P|0’«”). It follows that when 
any equation between the observables is expressed in terms of their representatives, 
all the matrix elements throughout the equation will refer to one and the same 
value of 0. This value for 6’ need not be explicitly referred to in the notation 
for the matrix elements, so that we may write (6’«'|P|0'«”) simply as (K’|P|K”). 
The equations will now be of exactly the same form as if 9 were a number, equal 
to this 6, and we used a representation defined by the «’s without the help of 0. 
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We shall apply this method to the present example. Thus we shall consider @ 
to be a definite number and on this basis work out the eigenvalues of y. Those of 
a and ( will be the same, from symmetry. Any numerical value that we give to @ 
which is consistent with equations (35) will be an eigenvalue of 6. Since a and 6 
are real, we can infer that the average value of y? for any state cannot exceed 6 and 
hence the eigenvalues of 7? cannot exceed 0. Thus the eigenvalues of 7 cannot be 
ereater than 0? or less than —92 The fact that any numerical value that we take 
for 6 must be positive or zero, since, as we have already seen, any eigenvalue of 6 
must be positive or zero, makes this restriction on the eigenvalues of 7 reasonable. 
We have from (35) . 

(a+ tB)y — Yat iB) = -iB -—a 
= —(a+if) 
or (a+ iB)y = (y— 1)(a + if). 
If we express this result in the y-representation, we get 
(ya + oBly")9" = (Y — Y(yle + iB"). 
Hence either (7'|a +78|7”) =0 or 7” = 7 — 1. Now if 7’ is any eigenvalue of 7, 
(l(a + 8)(a — #8) I7') Sola + tBly"V(y"la — 187), (36) 
y" 
the summation being over all eigenvalues y". The terms on the right-hand side all 


vanish except the one for which 7” = 7 — 1. If 7 — 1 is not an eigenvalue of ¥, 
then they all vanish and we have 


(7'\(a + #8)(a — i8)|7') = 0 





But (a +i8)\(a — iB) = a + B* — i(aB — Ba) 
SOS a 
=9-74+y7 


=0+4= (y= 4)" 
Hence if 7’ — 1 is not an eigenvalue of 7, we have 
0= (7'|6+4-(7-3)*l7) 
=0ea=4 =a) 


or yf Sack ke. 
where k is defined as the positive square root 

k = (0+4) (37) 
Thus if 7’ is any eigenvalue, we shall have the series of eigenvalues 7’, y/ — 1, 
7/—2,..., which must terminate since there can be no eigenvalue less than —02 


1 


The last member of the series must be either 5 + k or 5 —k, and since there is 
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no eigenvalue greater than 6% and thus none greater than k, it must be 5 —k. 
Thus the eigenvalues of y are $ —k, 3—k, 2—k,.... 

If we reverse the order of the factors in the product whose representative occurs 
on the left-hand side of (36), we can deduce by a similar argument that if 7 is 
any eigenvalue of y, either 7/ + 1 is another eigenvalue or y/ = —5 +k, and we can 
infer from this that the eigenvalues of y are k — 3, k — 3, k—3,.... By combining 
these two results, we see that 3 —k and k — } must differ by an integer, so that k 
must be an integer or half an odd integer. The eigenvalues of 7 are then 





k-3,k-3,k-8,...,-k+3,-k+}, (38) 
which shows incidentally that k must not be zero, as follows also from its defining 
equation (37). The corresponding value for @ is k? — +, so that the eigenvalues of 
? are all of this form. 

A new point that is brought out by this example is that if we have two 
observables that commute and choose arbitrarily one of the eigenvalues of each, 
then there will not necessarily exist a state for which each observable has its chosen 
eigenvalue, z.e. a state that is a simultaneous eigenstate belonging to these two 
eigenvalues. Thus the eigenvalues of y include all integers and half odd integers, 
and those of @ include all numbers of the form k? —4 where k is an integer not zero 
or half odd integer, but there exists a state for which y and @ have the values 7’ and 
k? — 4 respectively only provided 7 is one of the numbers (38). Such restrictions 
on the possible simultaneous eigenstates of two or more commuting observables do 
not in any way invalidate our general theory. 


VI. EQUATIONS OF MOTION 
AND QUANTUM CONDITIONS 


31. General Remarks 


THE theory that has been developed so far contains a complete account of 
the new concepts and mathematical machinery required in quantum mechanics 
and also all the general physical laws. Only the general properties of states 
and observables have, however, been discussed, no reference being made to 
the particular conditions that they satisfy in the case of a specified dynamical 
system. We must now consider the form of these particular conditions and so 
make the theory applicable to given physical problems. It should be understood 
that the assumptions that will now be made are on quite a different footing from 
the foregoing ones. We are now concerned not with general physical laws applying 
to the whole of nature, but with special assumptions referring to a given physical 
problem, such as the interaction of a certain number of electrons and atomic nuclei. 
These assumptions will show how the information that we are dealing with a certain 
number of particles of given masses interacting according to given laws of force is 
to be made use of, and will give us equations which may be considered as forming 
the mathematical specification of which dynamical system is under consideration. 
Future developments of the theory may show that these assumptions are only 
approximate and require modifications; in fact, as they will now be formulated, 
they are not in agreement with the principle of relativity and will at any rate 
require modifications on this account when applied to rapidly moving particles. 
On the other hand, the assumptions of the four preceding chapters are so closely 
interconnected that one could hardly modify them in any way without getting an 
entirely different scheme of mechanics, and the successes of the theory are so great 
as to make it fairly certain that no such modifications will be required, at least for 
the purpose of explaining the ordinary physical and chemical properties of matter. 
The theory of these four chapters is in agreement with the principle of relativity; 
in fact it is so general that it is independent of any special relations between space 
and time. We must, of course, for this to be true, adopt a more general definition 
of an observable than the value of a variable at some instant of time, which we can 
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do by considering an observable to be the quantity measured in any observation 
and to be defined by the way the observation is made, together with the positions 
of the various component parts of the observing apparatus and the times when 
they are set working, if necessary. An observable now need not refer to an instant 
of time in some frame of reference, so that there is no conflict with relativity on 
this account. For the non-relativistic theory of the present chapter the previous 
definition of an observable is adequate. 

If we are dealing with a given dynamical system, we shall have given dynamical 
variables, whose values at any time are what we call observables, and we shall 
require conditions that will determine the values of these variables at all times 
when their values at some particular time are known. These conditions will 
be the equations of motion of the system. In classical mechanics they would 
be sufficient to form the mathematical specification of the dynamical system 
under consideration. This is not so, however, in quantum mechanics, where 
additional relations are necessary for this purpose, which take the form of equations 
connecting the values of the variables at a particular time, of such a nature that 
they can replace the commutative law of multiplication of the classical theory. 
These additional relations are called quantum conditions. Jt is only when 
the quantum conditions are given as well as the equations of motion that we know as 
much about the variables as in the classical theory and can consider the dynamical 
system as mathematically completely specified. The equations of motion and 
quantum conditions are very closely connected with each other, and one cannot 
make any progress in solving a problem until they are both known. 

Our problem is now to determine the quantum conditions and equations of 
motion for any given dynamical system, such as that formed by given electrons and 
atomic nuclei interacting. It is known that classical mechanics gives an accurate 
description of dynamical systems under certain limiting conditions, e.g. when 
the masses are large. One would therefore expect to be able to obtain a theory 
of these systems when the limiting conditions do not hold by making some 
natural generalizations in the classical equations of motion and choosing quantum 
conditions that form natural generalizations of the classical conditions that all 
the variables commute. It will be found that one can in this way obtain 
a quantum theory of individual dynamical systems analogous to the classical 
theory. This quantum theory will not, however, include all the systems with which 
one has to deal, but only a large and important class of them, there being systems 
in the quantum theory which have no classical analogues (e.g. that consisting 
of a photon interacting with an atom, which will be treated in Chapter XII), 
for the treatment of which we must in each case choose special quantum conditions 
and equations of motion, either by means of special theoretical considerations or 
to fit experimental facts. 
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32. Poisson Brackets 


The classical equations of motion which we have to generalize may be written in 
the form 
dqr _, _ OH OH 
eG 8 ’ Pr =~ ; (1) 
Pr dr 
where the q’s and p’s are a set of generalized co-ordinates and their canonically 
conjugate momenta and H is the Hamiltonian, which is a given function of 
the q’s and p’s for a given dynamical system and is equal to the energy when 
it does not involve the time explicitly. These equations of motion involve partial 
differential coefficients, which in general have no meaning for dynamical variables 
in the quantum theory. We get over this difficulty by observing that the equations 
of motion (1), and also all other important equations of general classical dynamics, 
can be written in a form in which they involve partial differential coefficients only 
through Poisson Bracket expressions, and that, as we shall now find, these bracket 
expressions have their analogues in the quantum theory. Any two variables € 
and 7 have a Poisson Bracket (abridged to P.B.), denoted by [€,7] and defined in 


the classical theory by 
7 OE On OE On 
eal = d. ie Op, — OD OGy J 2) 


These P.B’s owe their importance to the fact that they remain invariant under 
a contact transformation (i.e. a transformation to a new set of canonical variables 
p;, ¢; such that the form of the equations of motion (1) remains unaltered), which 
results in the equations of motion being expressible in terms of P.B’s. We have 


in fact : : 
dr = lar, H], Dr = | eels (3) 
and more generally, for any variable €, 
fo O£ . Of -|t Of§ OH OF OH 
= aS {oy : mi} 7 X {ou Opry — Opy O4r 
= (6, HI. (4) 


To find the quantum analogues of these P.B’s we shall note some of their general 
properties and try to choose the quantum P.B’s so that they shall have the same 
properties. The following relations follow at once from the definition (2). 


é,7] = In, 1, (5) 
[é,<e] = 0 (6) 

















where c is a number, 
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[é1 + 2,7] = [€1,7] + (£2, n], 
[E,m + m2] = [€,m] + [€, nel, 


a d&\ O a 0&\ O 
eaten =f (Me gM) 2 — (Bg 4 ) On 


Tr 


= (61, n]é be 











[E, m2] = [€, m]n2 + mE, nel. (8) 
Also the identity 
[In Cl + In, [¢, él] + [¢, [é, mJ] = 0 (9) 


is easily verified. Equations (7) express that the P.B. [€, 7] involves € and 7 linearly, 
while equations (8) correspond to the ordinary rules for differentiating a product. 

We can define the quantum P.B. so that it also has all these properties, provided 
the order of the factors €; and & in the first of equations (8) is preserved throughout 
the equation, as in the way we have here written it, and similarly for the 7, and 
n2 in the second of equations (8). These conditions are sufficient to determine the 
form of the quantum P.B. uniquely, as may be seen from the following argument. 
We can evaluate the P.B. [&£, 772] in two different ways, since we can use either 
of the two formulae (8) first, thus, 


[E1€2, mime] = [&1, mmel€a + €1[&2, m2] 
= {[€1, m]n2 + m[E1, M2] } €2 + €1 {[E2, mIn2 + m[€2; al $ 
= £1, | N2€2 + m1, na|€2 ee [E2, mine + iM [E2, n2| 
and [£1€2, m2] = (£12, m) m2 + m[E&2, M2] 
= [€1, m}€2n2 + €1[€2, m|n2 + m[E1, 72] + m€1[E2, 12]. 














Equating these two results, we obtain 


(£1, m] (E22 — Ne€2) = (1m — m1) [E2, M2]. 


Since this condition holds with €, and 7, quite independent of € and m, 
we must have : 
Em — m1 = thlEi, m], 


Eone — Nr€2 = thls, no], 


where A must not depend on €; and 7; or €2 and 72 and also must commute with 
(£171 — m1), so that it must be a number. We want the P.B. of two real variables 
to be real, as in the classical theory, which requires that fh shall be a real number 
when introduced, as here, with the coefficient 7. We are thus led to the following 
general formula for the quantum P.B. |€,n] of any two variables € and n, 


En — n€ = ih, nl, (10) 
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in which A is a new universal constant having the dimensions of action. In order 
that the theory may agree with experiment, we must take h equal to h/27, where 
h is the universal constant that was introduced by Max Planck. It is easily verified 
that the quantum P.B. defined by (10) satisfies all the conditions (5), (6), (7), 
(8) and (9). These conditions often provide a more convenient way of actually 
evaluating a complicated P.B., by enabling one to express it in terms of simpler 
P.B’s whose values may be known, than that afforded by a direct application 
of (10). 


33. Equations of Motion and Quantum Conditions 
obtained from Analogy with the Classical Theory 


The assumption that the P.B. defined by (10) is the analogue of the classical 
one enables us to take over the classical equations of motion (3) and (4) into 
the quantum theory and also any other classical equations expressible in terms 
of P.B’s. Further, the assumption that the P.B’s of the p’s and q’s, which P.B’s 
in the classical theory have the values 


ldrsds] = 0, — [pr Ps] = 9, (11) 
lar, Ps| = Ones 

have these same values in the quantum theory, provides us with quantum 
conditions, since we can now, with the help of (5), (6), (7), (8), evaluate 
the P.B. [€,7] of any two analytic functions € and 7 of the p’s and q’s and thus 
obtain, by using (10), an equation for 7 — n€ capable of replacing the classical 
condition that 7 — n€ = 0. We have thus solved the problem of obtaining 
equations of motion and quantum conditions forming a natural generalization of 
the classical theory. The classical theory is, in fact, given by the limiting case 
h — 0 of the quantum theory.* 

The quantum conditions and equations of motion may be written without 
the use of P.B’s, if we eliminate the P.B’s with the help of their defining equation 
in the quantum theory, equation (10). We obtain in this way for the quantum 
conditions (11) 


rds — 4sdr = 0, rPs — DsPr = 0 
GrIs — Is4 Pp P PsP (12) 
GrPs — PsGr = iors, 
and for the equation of motion (4) 
ih€ = €H — HE. (13) 


The condition for a variable € to be constant is that it shall commute with 
the Hamiltonian H. 


*Original: h = 0 
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The notion of P.B’s is more fundamental in the quantum theory than in 
the classical theory, as is shown by the fact that one can define a P.B. in 
the quantum theory without reference to a set of canonical variables, which 
is not possible in the classical theory. For this same reason the notion of 
a set of canonical variables is less important in the quantum theory than in 
the classical theory. The notion of canonical variables is in the classical theory 
a dynamical notion, but in the quantum theory it is merely an algebraic notion, 
as the conditions defining when variables are canonical are then expressible by 
algebraic equations (11) or (12). Equations (11) may be considered as defining 
canonical variables also in the classical theory, but they then have no meaning 
unless the g,, p, are functions of another set of variables g*, p* which are 
given to be canonical, as otherwise the P.B’s are undefined. A transformation 
from one set of canonical variables to another is called in the classical theory 
a contact transformation, and this name may conveniently be taken over into 
the quantum theory. The transformations discussed in 819 evidently do transform 
one set of canonical variables into another, since, as shown in 819, they leave 
algebraic relations between the variables unaltered and the conditions for variables 
to be canonical in the quantum theory are algebraic. 

It should be understood that the symbols q, p, &c., in the equations we are 
now dealing with really denote the values q(t), p(t), &c., of the variables at some 
particular time ¢ that is not specifically mentioned, so that our equations are 
equations between observables depending on a parameter t. The € in (4) and (13), 
defined as the rate of change of the observable €(t) with respect to the parameter t, 
is also an observable. For observables &(t), 1(t) depending on a parameter t, 
we have the laws 
d — dn d& 


d d d 





which are consistent with the general quantum equation of motion (4) or (13), on 
account of their analogy with (7) and (8) respectively. 

It is legitimate for us to assume the quantum conditions (11) or (12) only for 
one particular time, and we must then deduce that they hold at all times from 
the equations of motion. We can do this by observing that, if equations (11) or (12) 
hold at one particular time t, then the time-rate of change of their left-hand sides 
must vanish at time t, so that they will hold also at time ¢ + dt, or alternatively 
by observing that, from the general equation of motion (13), the values of the p’s 
and q’s at time t+ dt are connected with their values at time ¢ by an infinitesimal 
contact transformation of the type (29) of §19. In order that we may be able 
to consider the commutative law of multiplication of the classical theory as 
completely replaced by our quantum equations, it is necessary that we should 
be able to evaluate expressions of the form €(t)n(t2) — n(t2)€(ti). This we can do 
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by using the equations of motion to express €(t,) and 7(t2) in terms of the p’s and 
q’s at some one time t and then applying the quantum conditions (12). 

The equation of motion (4) or (13) must be generalized when € involves the time 
explicitly as well as through the p’s and q’s. The classical generalization of (4) for 


this case is ~ O8€ 
which may be taken over directly into the quantum theory. The generalization of 
(13) is thus ae 

MS Ute aS (15) 


The Hamiltonian H is a constant when and only when it does not involve 
the time explicitly. The equations of motion are not affected by the addition 
to the Hamiltonian of an arbitrary numerical constant, even one that varies with 
the time. 

We are now in a position to be able to work out all that we require for any 
dynamical system when this system is specified by a Hamiltonian function H, given 
in terms of the q’s and p’s and perhaps also containing ¢t explicitly. It should be 
observed that the order of the factors of products in the expression for H may be 
important, since our variables do not now all commute, so that there is a greater 
variety of Hamiltonians in the quantum theory than in the classical theory. 
Thus for a given Hamiltonian of the classical theory there is not in general 
a unique corresponding Hamiltonian of the quantum theory, so that when one 
is given a dynamical system in the classical theory it is in general meaningless 
to talk about the same system in the quantum theory. There are, however, 
exceptions to this, it being possible in many cases to use the same language for 
describing dynamical systems in the quantum theory as in the classical theory 
without practical ambiguity. For example, one can describe a dynamical system 
as that of a particle of mass m moving in a field of force derivable from a potential 
function V. The Hamiltonian for this system in the classical theory would be, 
when expressed in Cartesian co-ordinates, 

H= 502 + py, + pz) +V(a,y, 2). 

One can without ambiguity say that the same system in the quantum theory is 
that having this same Hamiltonian, since this Hamiltonian does not contain any 
product of the type xp, for which the order of the factors is important. It should 
be remarked that this freedom from ambiguity in the passage from a classical 
Hamiltonian to a quantum one can be maintained only provided one uses always 
Cartesian co-ordinates, as in general different quantum Hamiltonians would be 
obtained, differing from one another by terms containing / as a factor, if one were 
to take over the classical Hamiltonian expressed in different kinds of curvilinear 
co-ordinates. 
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34. Schrodinger’s Form for 
the Quantum Conditions 


In this section and the following one some of the more important consequences 
of the quantum conditions (12) will be obtained. We shall here be concerned 
exclusively with the values of the variables g, p at one particular time, which will 
not be specifically mentioned. 

Equation (26) of Chapter II is, apart from the numerical factor h, the same 
as the quantum condition connecting any co-ordinate q, with its conjugate 
momentum p,. Thus we can take over the consequences of that equation and 
apply them to our present gq, and p,, with insertion of the factor h where necessary. 
Equation (27) of Chapter II gives us in this way 


fPr = prt = ih df / ddr (16) 


where f is any function of q, expressible as a power series. This equation evidently 
holds also when f is a function of the other q’s as well as g,, provided the total 
differential coefficient is replaced by a partial one. Again, from the argument at 
the end of 819, we can infer that each gq, and p, must have as eigenvalues all 
numbers from —oo to oo. This would actually be the case, for instance, if they 
were Cartesian co-ordinates and momenta of particles. 

It will now be shown that, ignoring a certain indefiniteness, one can give 
a meaning to the operator 0/0q,, applied to a w-symbol, or one can differentiate 
aw with respect to an observable q,. The simplest way of treating this problem is 
to suppose the w to be represented in a representation in which, amongst others, 
the observable q, is diagonal. The representative (q,’|) of ~ will be a function 
of the variable q,’, whose domain extends from —co to oo, and can therefore be 
differentiated partially with respect to q,’, giving another function 0(q,’|)/Oq,’ of 
qj. defined for this same domain —co to oo. This new function will represent 
a w-symbol, which we define to be 0w/0q,. It would, of course, be strictly 
correct to say that one can give a meaning to the operator 0/0Oq, applied to 
a w-symbol only provided for each w there is one unique Ow/Oq,, i.e. provided 
the above procedure for obtaining Ow /Oq,. gives a result independent of which of all 
the possible representations in which q, is diagonal we use, and this is not the case. 
There is thus an indefiniteness in the meaning of the operator 0/0q, applied to 
a w-symbol, the extent of which we shall now investigate. 

Let us take first the case of a system of one degree of freedom, so that there is 
only one co-ordinate q and only one variable q’ in the representative (q'|) of a w. 
By differentiating this representative we obtain O(q'|)/0q’, the representative of 
a possible Ow /0q, say (OW /Oq)a. Now in the present one-dimensional case the most 
general canonical transformation we can make such that q remains diagonal is 
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that which involves the multiplying of the representative of any w-symbol by 
an arbitrary phase factor. Thus the new representative of ~ will be of the form 


(q')* =e" (q'l), (17) 


where F” is short for F'(q’), a real function of q’. If we use this new representation 
to define 0w/0q, we obtain a new Ow /0q, say (OW /Oq)», whose representative in 
the new representation is 


0 N\* _ OtF’ 0 N\ 1 258 ’ 
ag tl) =e ag (2!) ea (q'|). 





The representative of (Ow/Oq),, in the original representation is therefore 





pte) O dF" 
—tF" * (f|\* — N\ ! 
€ ag |) ag! |) I “dd (q Re 
and hence | = (s*) eee 18 
(Fe) = (Fe) +e (18) 


This is an equation giving the general connexion between two Ow/Oq’s. It shows 
that the indefiniteness in the operator 0/0Oq consists in the possible addition of 
an arbitrary imaginary* function of q. 

In the n-dimensional case the general canonical transformation which leaves 
a single q, g, say, diagonal is much more general than a mere change of phase 
and thus the indefiniteness in the operator 0/0Oq, is much greater than in 
the one-dimensional case. Whenever we use this operator, however, we shall deal 
not with a single 0/0q, alone, but with the whole set 0/0q,, 0/0q2,..., 0/Odn 
together, which will make only those meanings for the operators useful that 
arise from a representation in which all the q’s are simultaneously diagonal. 
The arbitrariness in this representation is then again merely that of the phase, 
like (17), and leads again to the form (18) for the connexion between two (0/0q,)’s, 


1 
namely (22) 7 (2) hy is 
Odr —_ OGeJ,- OGe'” 


where F' is now an arbitrary real function of all the q’s. Thus the indefiniteness in 
the operators 0/Oq, now consists in the possible addition to each simultaneously 
of a function of the q’s, of the form 10/0q, for the r-th. This small amount of 
indefiniteness has, however, been attained only by our considering each 0/0q,. as 
not specified by the observable gq, alone, but by g, as one of a given complete set 
of commuting observables q1, G2,.--, Gn- 














*‘nure’ omitted 
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The operators 0/0q, applied to ~-symbols are linear operators that can be 
applied to an arbitrary ~ and are thus just ordinary observables. We shall 
call 0/0q, considered as an observable, z,. The representative of 7, in 
the q-representation used for defining 0/0q, is 


(|r |a") =—6 (gq — 41) 5(42-43)- - (G1 — Fp 1) 8 (GG) (Gp rg): + (Gn Gn): 

(20) 
which is similar to expression (49) of Chapter IV. The matrix representing 7,. 
is thus antisymmetrical, showing, according to equation (23) of §21, that 7, is 
an imaginary? observable. The form of (20) shows also that when 7, is multiplied 
into a ¢-symbol, the result is 


QT = —06/0qr, (21) 


in which 0¢/0q, is defined through its q-representative in the same way as 
Ow /Oqr was. 

The commutability relations connecting the 7’s with each other and with the q’s 
will now be obtained. For this purpose we use the fact, which is easily verified, 
that the operators 0/0q, applied to w-symbols obey the same laws as when applied 
to ordinary functions. Thus 











Oy Ory 
Oq-Oq. 04304, 
or Tete = Wee, 
and hence Nps — NsMr = 0. (22) 
Again 5 (a) = dr a T OrsW 
or TsQr = OTs T Ors), 
and hence Ops — TsQr = —Ors- (23) 


More generally, if f is any differentiable function of the q’s, 








0 =~ OW) OF 
aq, FY) = 5, + ae 
or TY = fas + (Of/0as)Y, 
and hence fa, —1s.f = —Of /Oqs- (24) 


These relations (22), (23), (24)§ could have been obtained alternatively directly 
from the representatives (20), with the help of properties of the 6 function given 
in §22. 





pure’ omitted 
8 omitted 
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The relations (22) & (23) for the 7’s are, apart from a numerical factor —ih, 
just the same as the quantum conditions (12) for the p’s. Thus the observables 
—ihr, satisfy the same commutability relations with each other and with the q’s 
as do the p,. Equation (24) now corresponds to (16), with the difference that (24) 
has been shown to be valid for any differentiable function f, not merely for one 
expressible as a power series. There exist many sets of observables 7,, owing to 
the indefiniteness in 0/0q, discussed above, but each such set must satisfy (22) and 
(23) and thus give rise to a set of observables —ihz, with the same commutability 
properties as the p,’s. Any one of these sets, 7,, is connected with any other, 7, 
according to equation (19), by the relation 


Tp = Tra + 10F /0Q,. (25) 


It will now be shown that there exists one set of 7,’s such that ps + tht sq as just 
equal to pr. 

If we take any set of 7's, 7a say, then from (23) and (12), ps, + ifs must 
commute with each q, and must therefore be a function of the q’s only, 7.e. 


ps + thts, = f5(¢q)- (26) 


Each f, must be a real function of the q’s, since both p, and —ihm.q are real 
observables. Again, from (12) and (22), we obtain 


0 = PrPs — DsPr 
= (—iht ya + f,)(—thtsa + fe) a (—thtsa F fe) Ue a fr) 
= —thiltrafs te ‘een aa Tease _ i eArals 


or Tsadr — SrTsa = Trats — fsTra- 
With the help of (24) we now find 
Of,/Os = Ofs/OGr 
which shows that the functions f, are all of the form 
fr = OG/04r, 


where G is a function of the q’s independent of r. Thus (26) becomes 


= —iht yg + oe —ih | t eee 
Pr = ra Og = ra hi Oa i 








We can introduce a new set of 7's according to equation (25) taking F’ equal to 
G/h, since F is an arbitrary real function of the q’s and G is real. For these new 


a,s we shall then have 
Pr = —tht,. (27) 
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Equation (27), which was discovered by Erwin Schrédinger, is a very 
important one in applications of quantum mechanics. It is a consequence 
only of the quantum conditions (12) and may be regarded as a new form in 
which these quantum conditions may be expressed. It shows that we can take 
a representation in which the q’s are diagonal and in which each observable p,., when 
mniultiplied into a ~-symbol, is represented by the operator —ihO/Oqj. operating 
on the representative (q’'|) of this w-symbol. When p,. is multiplied towards the left 
into a @symbol, it is then represented by the operator ihO/Oqj. operating on 
the representative (|q') of this ¢-symbol. If f(q,,p,) is any function of the q’s 
and p’s, expressible as a power series in the p’s, then it is equivalent to 


I (4s; —ihin,y), (28) 


obtained from f(q;,p,) by substituting —ihz, for each p,. This is to be understood 
as meaning that when f is multiplied into a ~w-symbol, its representative 
is the operator f(q,,—ihO/Oq).) operating on the representative (q‘|) of this 
w-symbol, and when multiplied into a ¢-symbol, its representative is the operator 
f(d,,1hO/Oqj.) operating on the representative (|q’) of this ¢-symbol, where f is 
the function obtained from f by reversing the order of all the factors in each term. 
The equation for determining the eigenvalues f’ of f is thust 


/ ° 0 / / / 
(dig) (=F). (29) 
which is an ordinary partial differential equation for the unknown function (q'J) 
and unknown number f’ When f is the Hamiltonian or energy of the system 
(assumed not to involve the time explicitly), this becomes Schrédinger’s equation 
for the determination of the possible numerical values for the energy. 

Equation (27) shows up the meaning of the indeterminacy in a representation 
when only the observables that are to be diagonal in it are specified. Corresponding 
to each representation in which the q’s are diagonal there exists one set of 
observables conjugate to the q’s |[i.e. satisfying the same conditions as the p’s 
in (12)|, whose representatives are of the specially simple form —ih 0/Oqd). [when 
multiplied into a representative (q’|) of a w-symbol]. If we now take one particular 
set of observables conjugate to the q’s and require that the representatives of 
these shall be of the specially simple form —ihO/Oqd., the representation is then 
completely determined, except for a trivial phase factor e’’, where ¥ is independent 
of the q’s, since the function F' in (25) is completely determined by the condition 
that —iha, must equal p,, except for an arbitrary constant. The indeterminacy 
in a representation when only the diagonal observables are specified is of the 





tOriginal ‘’ replaced by ‘’ 
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same nature, although it cannot be discussed in the same way, when any of these 
diagonal observables has no canonical conjugate, as is the case, for instance, when 
its eigenvalues do not extend from —oo to oo. 

From equations (27) and (24) we see that (16) holds also for functions f of 
the q’s that are not expressible as power series. 


35. The Transformation Function (q'|p’) 


The result (27) which we deduced with reference to the q-representatives of 
w- or ¢-symbols must be applicable also to the transformation functions connecting 
two representations of which one is the g-representation, since these transformation 
functions are nothing but the representatives in either of the representations of 
the fundamental w’s and @’s of the other. For instance the transformation function 
(q'\a’) is the q-representative of w(a’). Hence from (27) the representative of 
prb(a’) is —ihO(q|a’)/Oq,. This representative, equal to [(q'|p-|¢") dq” (¢’|a’), 
may be written (q'|p,|a’) in the notation of mixed representations of §27, so that 
we have 


(7'|prla’) = —ihO(q'|a") /04,. (30) 
Similarly, if f(q,,p,) is any function of the q’s and p’s expressible as a power series 
in the p’s, we see from the result (28) that 





fl fla’) = F(a. —inge> ) (dl) (31 


Tr 


Again, the transformation function (q‘|a’) is the q-representative of ¢(a’), so that, 
remembering (21), we obtain from the result (27) 


(a'|prlq’) = thO(a'|q/)/0q, (32) 
and from the result (28) 


Canes («i.ing-) (ald) (33) 


r 


We shall now apply (30) to calculate the transformation function (q'|p’) 
connecting a co-ordinate q with its conjugate momentum p. We have 


(_'[p\p’) = —th O(q' |p’) /O¢. 
But from equations (22) of Chapter V 


(q'|p\p') = (a'|p')p* 
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Hence —ihdo(q' |p')/dd = p'(d |p’). 


This is a differential equation for the unknown function (q’'|p’) of q¢. lts general 
solution is 
(q'[p’) = ela 
where a’ is an arbitrary function of p’. 
We can determine the modulus of a’ by using the normalizing condition 


y : (p'|q') dd (q'|p") = 6(v' — pv"). 


This gives, when we put 
(p'ld) = lp’) = Tee", 
the equation = 
aa" | ene te PF dal = 5(p' — p"), 


where a” is the value of a’ when p” is substituted for the p’ in it. By carrying out 
the integration with respect to q’ we obtain 
1 ! " ih —igln! —p")/h gG=—00 
= qGolp Pes | a(p ie 


= — isin q(p" —p")/] ay 


q=—00 


Integrating each side with respect to p”, we now get 


cou Pl St q=00 
= 2n| | sin q(p pe "| 








=f Pp ! " 
aa oe) P—pPp 


SW j=l = lh 
Thus a’ = h-*e”, 
where 7 is some real function of p’, and hence 
Gp ante en 


By suitably choosing the arbitrary phase in the p-representation we can remove 
the phase factor e’”, which will leave us with 


(q'[p') = hBet ef * (34) 


There is no arbitrary phase in the q-representation, since this phase is fixed when 
we use equation (27) or (30). 
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Our result (34) shows that the p- and g-representatives of a w-symbol are given 
in terms of one another by the relations 


(|) =r i eid ag! (q/\), 


06 (35) 

i= wo fear orb, 
Thus either of them is given by the components in the Fourier resolution of 
the other. The transformation function connecting the n q’s, q1, q2,---; Gn, With 
their n conjugate p’s, pi, P2,---, Pn; 18 given by simple multiplication, 


(4195 -»- GrIP1P2--- Pr) = (a4 1P1) (a2|P5) - - - (dhlPr) 
— p-/2 iPr tP2GtPn dn )/h (36) 


36. The Space-displacement Operator 


In §34 we saw how to give a meaning to the operator 0/0q, applied to a w-symbol. 
For this purpose we had to make use of a representation in which q, is diagonal. 
There are, however, certain cases in which one can give a meaning to this 
operator independently of any representation, so that this meaning becomes of 
more fundamental importance. These are the cases in which q, is the value (x say) 
at a particular time of one of the Cartesian co-ordinates of the particle when 
the system consists of a single particle, or of the centre of gravity of the whole 
system in the general case. The operator 0/0x applied to a state is then connected 
with the operator of displacement of the state in the direction of the x-axis, as will 
now be shown. 

Let 7, denote any state of the system, arising when the system is prepared in 
a certain way. We now introduce that state w2 which is the same as ¥, except for 
being displaced through a distance 6x (a number) in the direction of the x-axis 
at the time t. To define w,. rigorously, we must suppose all the apparatus used 
in the preparation of w, and all the external forces acting on the system up to 
time t to be displaced through this distance dx, the external forces after time t 
being unchanged. The state of the system after time t, which state is completely 
defined in this way, will then be wo. We can now form the difference wy — yy, 
and divide by 6x and proceed to the limit 6x — 0. The result of this procedure 
will be a ~-symbol which depends in some linear way on our initial ~-symbol w. 
Thus we shall have 


(we _ W)/dx = dz, 


lim 
6x0 
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where d, is a linear operator, i.e. where 


d,(, + 3) = dz + daw, 


for arbitrary ~, and w3. Our displacement procedure thus enables us to define 
a displacement operator d,, which, being a linear operator that can be multiplied 
into any w-symbol, can be regarded as an observable. 

The displacement operator d, is not completely defined owing to the fact 
that the ~-symbol w». is undefined to the extent of an arbitrary numerical factor. 
If we make the assumption that w,. shall have the same ‘length’ as q, 7.e. that 


Pathe 7 ow, 


then this arbitrary factor will be of the form e”, where y is a real number. 
Thus if q* is any alternative q2, we shall have w.* = e’%2. Our new displacement 
operator d,” will now be given by 





dy Wy = Jim (ee — U1) /dx 
= yoa—yr  e7—-1 
= ae { és vo} 
= dz + 1aw, 


where a is a real number, equal to the limit (assumed to exist) of y7/dv. Thus 
dy,” = dy, +ia, 


so that the indefiniteness in our displacement operator consists of merely 
an arbitrary imaginary number. 

The series of operations by which, given any w-symbol w, we defined 
the w-symbol d,w may be applied also to any ¢-symbol @ and will then give us 
the ¢-symbol d,@. When d, is regarded as an observable it can be multiplied into 
a o-symbol to give a product ¢d,. The connexion between d,@ and ¢d, will now be 
obtained. Any product of the form dw is a number which must remain unchanged 
when both the ¢ and w are displaced through the distance dx, and hence 


dz(oy) = 0. 


Since d, is of the nature of a differentiation, we can use the ordinary law for 
the differential coefficient of a product, which gives us 


(drow (de) = 0. 
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When we consider d, as an observable, we have 


(deh) = (Gde)¥. 


Hence (dio) = —(dd,)w. 
Since this is true for arbitrary w, we obtain 
ody re —dzd, 


which is the required connexion. This result is analogous to (21). It shows us that 
the conjugate imaginary symbol to d,w,, which is, of course, just d,@,, is equal to 
—,d,, and hence allows us to infer that d, is an imaginary* observable, like the 7, 
of (21). 

We shall now obtain the connexion between our new operator d, and 
the operator 0/Ox defined according to §34. We take a representation in which 
x is diagonal. We suppose further that the phase of this representation is 
independent of x, so that when a w-symbol is displaced in the direction of 
the z-axis, its representative (x’|) is merely displaced an equal distance through 
the domain of the variable x’. (If the phase were arbitrary, then when the ¢-symbol 
is displaced its representative would be changed in some more complicated way.) 
The representatives (x’|1) and (x'|2) of y and w2 are now connected by the relation 


(x'|2) = (2’ — 621). 
Thus the representative of d,~, will be 


nn AO OR Ly ST) = Dey 
aon bx Oe ely 
and hence d,= 2. (37) 


Ox 





Equation (37) holds, of course, only for one of the possible operators 0/02. 
The others will differ from this one in accordance with equation (18). It will now 
be shown that the one for which (37) holds is the same as the one which, considered 
as an observable 7, satisfies (27) or 


Pe = —thr,, 


pz being the momentum conjugate to x. This will mean that, with d, considered 
as an observable, 
Ox — TNs: (38) 





*‘nure’ omitted 
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We prove this by observing that p, and thd, satisfy the same commutability 
relations. When the w-symbol xw is displaced through the distance 6x the result 
must be (a—62)w2, in which x has been changed into x—6z, since the displacement 
of apparatus required for the definition of the displaced ~-symbol causes apparatus 
that measures the observable x to become apparatus that measures x — dz. 
Thus from the definition of d, 


d,vy, = lim {(x — da) — adi} /de, 


= «dy — {. 
Hence d,w — «dy, = —1. (39) 


In the same way it may be shown that d, commutes with y, z, pz, Py, pz, and in 
fact with every dynamical variable (at time t) independent of x. Thus p, — ihd, 
commutes with everything and must be a number. We may take this number 
to be zero, on account of the arbitrary? number arising in the definition of d,, 
and thus obtain (38). 

Equation (38), which connects our displacement operator d, with 
the momentum p,, is an alternative way of expressing the quantum conditions (12) 
or (27), in so far as they refer to the centre of gravity of the whole system, 
and is perhaps the most fundamental of all ways of expressing them, showing 
most clearly the underlying physical assumption. This equation (38) is quite a 
plausible assumption for one to make for one’s quantum conditions, apart from 
the fact that it is derivable from equations (12), which were set up from analogy 
with the classical theory, on account of its simplicity and generality and the fact 
that it leads at once to the law of the conservation of momentum. When there 
are no external forces acting on the system, we see from the definition of d,, that 
it does not depend on the time t. Equation (38) then shows that the momentum 
does not depend on ¢ and is therefore constant. 


37. The Time-displacement Operator. 


Corresponding to the space-displacement operator d, of the preceding section, 
we now introduce an analogous time-displacement operator d;, defined as follows. 
If w, is any w-symbol, we form the time-displaced w-symbol w2 by supposing all 
the apparatus used in preparing w~, to be set in motion a time o¢ later and all 
varying external forces acting on the system up to time t to be retarded a time dt. 
The state of the system after this time t¢ will then be our w2. We now take the limit 
of (w2—w1)/dt and define it to be dyw,. We can consider d; to be an observable and, 





additive’ omitted 
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as in the case of d,, can show that it is an imaginary’ observable and that it is 
completely defined except for an arbitrary, imaginary! numerical constant. 

By means of this d; we shall deduce the equations of motion of the system. 
In this way we shall establish the form of these equations without anywhere making 
use of classical analogues. We introduce the real observable H defined by 


Thus Hw = -ihdy) (41) 


for arbitrary w. If we now take any observable € that is the value at time t of some 
dynamical variable and apply (41) to the ~-symbol €q, we obtain 


A&p = —thd fy. 
We can evaluate the right-hand side here by the method used for deriving (39), 


or more directly by making use of the fact that the ordinary law for the 
differentiation of a product applies to the operator d;, so that 


Héw = —ih(d,€)p — ihg (di). 
It is now easily seen that d,é is just the ordinary time differential coefficient €. 


(This is to be contrasted with the corresponding result for the d, operator, namely, 
d, = —0€/Ox.) We thus obtain 


Hép = —thy + €Hy, 
which gives ihE = €H — HE. 
This is of the same form as (13), with for Hamiltonian just the H defined in terms 
of the time-displacement operator d; by (40). 

The above argument is quite general and shows that the equations of motion for 
any dynamical system are expressible in terms of a Hamiltonian in the form (13), 
whether this system is one that has an analogue in the classical theory and is 
describable in terms of canonical co-ordinates and momenta or not. The general 
dynamical system in quantum mechanics is thus one in which the dynamical 
variables satisfy arbitrary commutability relations, and there is a Hamiltonian 
which is an arbitrary real function of them. More generally still, we may have 
a system in which the Hamiltonian cannot be expressed as an analytic function 
of dynamical variables and can be specified only through its representative in 
some representation, which representative may be an arbitrary Hermitian matrix. 
An example of a system of this more general kind is provided by the problem, 
considered in Chapter XII, of the interaction of a photon with an atom. 

Corresponding to equation (37) we can prove the result 


0 


di = er 


(42) 





8‘nure’ omitted 
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We must first give a meaning to the operator 0/0t applied to a w-symbol, 
which we can do with the help of a representation in which a complete set of 
commuting observables qu, qat;---, Unt are diagonal, which observables must be 
the values at time ¢ of a set of dynamical variables q, q2,.-., dn (which need not 
necessarily have conjugate momenta p;, p2,..., Pn). The representative of any 
w-symbol w will now be a function of the n variables q/,, q,,---, d.,, the form of 
this function depending in general on t. Thus we can regard this representative as 
a function of the n + 1 variables qj, q,... qj, t, and as such can differentiate it 
partially with respect to t and define the resulting function to be the representative 
of Ow/ot. We get in this way a general definition of the operator 0/0t in 
which there is, of course, a considerable amount of indefiniteness, owing not 
only to the arbitrary phases of the representation but also to the fact that 
we can take different sets of q’s to be diagonal and will then in general get 
different results. We are interested, however, in only one of the operators 0/0t, 
this being the one that is given when the phases of the representation do not 
depend explicitly on t, so that when a time displacement dt is applied to a state, 
the q45:-representative of the displaced state is the same function of its variables 
Gas, that the q@-representative of the undisplaced state is of its variables qq. 
Thus to obtain the q,-representative of the displaced state we must substitute t—dt 
for t in the q-representative of the undisplaced state, considered as a function 
of the n + 1 variables qi, qb,.-., ¢, t. There is now complete analogy with 
the x-displacement case, so that (42) follows in the same way as (37). The validity 
of (42) shows that the operator 0/Ot defined by a representation with phases 
not explicitly dependent on t is independent of which set of q’s are diagonal in 
the representation. If we have one representation giving a 0/Ot operator that 
satisfies (42), we can obtain another by making any canonical transformation for 
which the transformation function does not involve t. 
From (41) and (42) we obtain 


ih = Hy. (43) 


This may be regarded as an alternative way of expressing the equations of motion 
of the system. Expressed in terms of representatives, it gives ust 


° 0 / / " MW " 
ins-(al) = f (all Plat) af (a) (44) 


an equation which shows how the representative (q/|) of a state, considered as 
a function of the n + 1 variables qj, q5,..-, q,, t, varies with t. When the q, have 





+The case of continuous q/’’s is taken for definiteness, the usual modifications in the notation 
being required for the discrete case. 
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conjugate momenta p,, it reduces to the ‘ordinary’ differential equation 
O O 
th—(qi|) = H| qi, -ih— } (q]). 45 
pyletl) = H (ah =i ) (a (45) 


This equation was discovered by Schrédinger and is known as Schrédinger’s wave 
equation. It is very useful in applications of quantum mechanics since its solutions 
have an immediate physical interpretation, the square of the modulus of any 
solution giving the probability of the q’s having specified values for one particular 
state throughout all time. It is called a wave equation because in many elementary 
examples, as will be seen in the next chapter, its solutions are of the form of 
waves moving through q-space. For this same reason the solutions are called 
wave functions, even also in those examples where they have no resemblance 
to waves. 

When the Hamiltonian does not involve the time explicitly, the wave equation 
in the form (45) or in the more general form (44) will have solutions that vary 
periodically with the time, according to 


(d/l) =(a'Doe ™™, (46) 


where W’ is a number and (q’|)o is independent of t. The equation that (q’|)o must 
satisfy is 


W'(¢ Do = (Hla) dq" (ao 


! 20 ’ 
=f (a. -ing (q'\)o. 


But this is just the equation for determining the eigenvalues of H, namely, 
equation (29) with H for f. Thus W’ is an eigenvalue of H or energy-level of 
the system and (q’|)o is an eigenfunction of H. 


38. Heisenberg’s Matrices 


In the preceding section we dealt with a q,-representation, defined by observables 
q that are the values at time t of a set of dynamical variables g. We saw 
that if the phases of the representation are suitably chosen, then Schrédinger’s 
equation holds, in the form (44) or (45), in which case the representation may 
conveniently be called a Schrédinger representation. The condition for the phases 
is such that, when a state is given a time-displacement dt, the q45:-representative of 





tquote marks added 
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the displaced state is the same function of its variables qj, 5, as the q-representative 
of the undisplaced state is of its variables gq. This condition will hold in 
an analogous form for observables. If we take an observable €; which is the value 
at time t of a dynamical variable €, then the displaced observable will be &45¢. 
We shall then have that the q45-representative of the displaced observable, 
namely (qj, 5:/&+5el@.5:), iS the same function of its variables qi,5, & q,5,, a8 
the q-representative of the undisplaced observable, namely (q¢)|&|q’), is of its 
variables gq, & q/. This means simply that the form of the function (q@|&|q/’) 
of the variables q; & gq is independent of t. More concisely, one can say that 
the Schrédinger representative of & 1s independent of t. 

In general, when one wants a representation of observables, the Schrédinger one 
would not be a convenient one to take, since it refers to a definite time t and gives 
simple representatives only for those observables € referring to the same time t. 
A convenient representation would now be one which makes no reference to any 
time t, so that observables &,, 7,,..., referring to different times ¢), to,..., could 
all be represented simultaneously and would all be on the same footing. For such 


a representation we should have 
a |= 2 a’ |E,| a” |. (47) 
dt 


(o'|S 
a 


dt 

Such a representation can easily be obtained when the Hamiltonian does not 
involve the time explicitly. In the general case it is not so easy and is therefore 
then not very useful. 

When H does not involve the time explicitly we can take for the observables 
a that are diagonal in our representation a complete set of commuting dynamical 
variables that are constants of the motion. Then H will commute with the a’s 
and will be a function of them, represented by a diagonal matrix 








(a!|Hla") = H'S.ra, 


H’ being written for H(a‘), for brevity. Our representation will now be one that 
is independent of t (provided the phases are independent of t), so that equation 
(47) holds. There is now a simple law for the variation of the matrix elements of 
& with t. From the equation of motion (13) we obtain 


iR(a' Ela") = (a'|g\a"”) HH" — H'(a' Ea"), 


which, with the help of (47), becomes 


d 
ih (a'lgla") = —(H" — H")(a'|g|a"). 
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Hence (a‘|€|a”) varies with t according to the law 
(c’|Ela”) = (a! Ela) get -#0/0, (48) 
(a’|E|a”)9 being independent of t. The variation is thus periodic with the frequency 
|H' — H"| /2ch = |H' — A" | /h. (49) 


This scheme of matrices, in which the Hamiltonian is diagonal and the matrix 
elements all vary with the time according to the law (48), was discovered 
by Werner Heisenberg in 1925 and was historically the first form of 
quantum mechanics. 

A diagonal element (a’|€|a’) does not vary with the time. This diagonal element 
is the average value of € for a fundamental state w(a’) of the representation. 
Thus for each fundamental state w(a’) the average value of any dynamical variable 
€ is aconstant. The probability of € having any specified value is therefore also 
constant, since this probability is determined by the average value of functions 
of €. Thus each w(a’) is a stationary state according to the definition of §3. 
The fundamental states of a Heisenberg representation are stationary states. 
Any eigenstate of H may be taken as a fundamental state of a Heisenberg 
representation and is therefore a stationary state. 

The matrices of MHeisenberg’s representation fit in very well with 
the ‘anschaulich’ forms of quantum theory in existence before quantum mechanics, 
in particular with Bohr’s theory of the atom. The fundamental states of 
the representation are Bohr’s stationary states (which are really stationary, 
of course, only so long as one neglects the interaction of the atom with radiation) 
and the eigenvalues of H are Bohr’s energy-levels. It follows that the frequency (49) 
of matrix elements referring to two states a’ and a” is that of the quantum of 
radiation emitted or absorbed according to Bohr’s theory when the atom makes 
a jump from one of these states to the other, as was assumed by Heisenberg in 
his first work on quantum mechanics. There now arises a strong correspondence 
between the matrix elements representing any dynamical variable and the Fourier 
components of that variable in the classical theory for a multiply-periodic system. 
This correspondence led Heisenberg to the assumption that the rate of spontaneous 
emission of radiation of a system in the quantum theory can be obtained 
from the classical formula if one substitutes in this formula for the Fourier 
components of the total electric displacement of the system the corresponding 
matrix elements. According to this assumption, a system having an electric 
moment D (a vector) will, when in the state a’, emit radiation of frequency 
v = (H' — H")/h, where H” = H(a’”) is an energy-level, less than H’, of some 
state a”, at the rate 

4 (2rv)4 
es 





[(a'|Dla")|’. (50) 
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Also the distribution of this radiation over the different directions of emission and 
its state of polarization for each direction will be the same as that for a classical 
electric dipole of moment 


(a"|Dla”) + (a"|Dia’). 


To interpret this rate of emission of radiant energy according to Bohr’s theory, 
we must divide it by the quantum of energy of this frequency, namely hv, and 
call it the probability per unit time of this quantum being spontaneously emitted, 
with the atomic system simultaneously dropping to the state a” of lower energy. 
A justification for these assumptions of Heisenberg will be obtained in Chapter XII, 
where a quantum treatment of the interaction of an atomic system with radiation 
will be given. 

By altering the phases in a Heisenberg representation we can pass to 
the Schrédinger representation in which the same a’s are diagonal. Let us see 
what is the connexion between the phases in the two cases. In the Schrédinger 
representation the representative of any state will satisfy the wave equation 


mS n n " TA 
ihe (a |) =S/(a |H)a")(a"|) = H'(a'h), 


Ql’ 


which can in this case be integrated directly and gives 
(a') = (a' [oe 7" 


where (a’|)o is independent of t. On the other hand, the representative of a state 
in the Heisenberg representation will not depend on t, since the representation 
and also, of course, the state do not in any way depend on t. Hence the phases 
of the Schrédinger representation are e~“”""/" relative to those of the Heisenberg 
representation, a result which could have been obtained alternatively from 
a comparison of (48) with the fact that the Schrédinger representative of 
& is independent of t. There is thus a difference between the phases of 
the Heisenberg representation, which are totally independent of t, and those of 
the Schrodinger representation, which are explicitly independent of t. The explicit 
independence of t for the Schrédinger representation means simply that any matrix 
in this representation represents a function of the dynamical variables that does 
not involve t explicitly. 


VILELEMENTARY APPLICATIONS 


39. The Free Particle 


IN this chapter we shall consider some simple dynamical systems according to 
quantum mechanics. The simplest of all systems is that of a particle in free 
space. For this system we may take as dynamical variables the three Cartesian 
co-ordinates zx, y, z and their conjugate momenta pz, Py, pz. The Hamiltonian 
in classical mechanics, when one takes into account the variation of the mass of 
the particle with its velocity required by the principle of relativity, is 


HacmMe+pet+ p, ate p), (1) 


where m is the rest-mass of the particle and c is the velocity of light, 
and the positive square root is taken. This Hamiltonian may be taken over into 
the quantum theory when one gives the meaning of $16 to the positive square root, 
which one can do since the eigenvalues of m?c* + p? + pi, + p2 are all positive. 

The momenta pz, py & pz commute with H and are thus constants of 
the motion, as in the classical theory. Again, the co-ordinates x, y & z vary 
according to the equations 








CDs Pe De De 
é=[0,H]=SF,  g=G t= ah (2) 
the same as in the classical theory. These equations may be verified in the quantum 
theory by an application of equation (16) of §34, which equation, as remarked 
at the end of that section, holds also for functions that are not expressible as 
power series. The general proof of this equation, however, required the use of 
a representation. It is of interest to notice that we can deduce (2) by working in 
abstract symbols and not making any use of representations, in the following way. 
We have by a direct application of the quantum conditions 


cH? — H's = (xp? — px) = 2ihc’pe, (3) 
or (cH — Hx)H + H(xH — Hz) = 2ihc’p,. (4) 
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Now H commutes with p, and hence from (3) 


(cH? — H°x)H — H(xH? — H’x) =0, 
which gives (cH — Hx)H? — H?(czH — Hr) =0. 


We must now use the condition that (m*c? + p? + p? + p2)4 being defined 
as a square-root function, commutes with everything that commutes with 
mc? +p. +p,+p:, i.e. H commutes with everything that commutes with H”. 
We have just seen that H? commutes with xH — Hx and hence H must commute 
with cH — Hx. We can now infer from (4) that 


tH — Hx = ihe’p,/H, 


which gives the first of the equations (2). We thus have an illustration of the 
fact that any result that may be obtained with the help of a representation 
can also be obtained from the abstract symbols alone without reference to 
representations, but that the method with a representation may be much quicker 
and more convenient. 

The Schrédinger equation for the Hamiltonian (1) ist 


0 eo ge ey} 
iS (el) =e mee — (- ap wa) § (al) (5) 


where the x in (z|) stands for x, y and z. We have here on the right-hand side 
the square root of an operator involving 0/Ox,... which square root cannot be 
expressed as a power series that is valid for the whole range of eigenvalues of pz, 
py & pz, namely —oo to oo. To give a meaning to such a function of an operator 
we should in general have to make a canonical transformation to a representation in 
which the observable corresponding to this operator is diagonal, when the meaning 
would be as given in 815. Our present example is, however, sufficiently simple for 
this not to be necessary. We can write down solutions of (5) immediately, namely 





(z|) = aexpi(p,a + pyy + plz — W't)/h, (6) 
where p’,, p’,, p,, W' are numbers satisfying 
W? = 2 (me +p? + pr +p), W'>0 


and a is an arbitrary number. The general solution of (5) can be expressed as 
a sum or integral of solutions of the form (6). 





tThe primes are omitted from the variables in the wave function. This is permissible when 
it does not lead to confusion. 
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The state represented by (6) is an eigenstate for the components of momentum, 
belonging to the eigenvalues p’,, p', & p!,. The corresponding value for the energy 
is W’. The representative (6) is, in fact, of the same form as the transformation 
function (36) of §35. Thus the state of a particle moving in free space with a given 
momentum is represented by plane waves of the type (6), the direction of motion 
of the waves being determined by (p’,,p',,p,), the momentum of the particle. 
The probability of the particle being found in any specified volume dadydz at 
time t is proportional to |(z|)|? dzdydz and is thus independent of the position of 
this volume. The wave-length \ of the waves is given by 

A=h/ (py + py + pL) = h/P", (7) 
where P’ is the magnitude of the momentum of the particle, and their frequency 
vy is given by 

Pa Wh, (8) 
Thus their velocity wu is 
u=\wv=W'/P' = c*/2, (9) 
where v is the velocity of the particle. 

The fact that the velocity of the waves and the velocity of the particle both 
lie in the same direction and are connected by the relation (9) holds, of course, 
in all Lorentz frames of reference. It was this relativity invariance which first 
led Louis de Broglie, before the discovery of quantum mechanics, to postulate 
the existence of waves of the type (6) associated with the motion of a particle, 
which waves would control the particle in the same way in which light-waves 
control photons. The case of the photon may be obtained from that of the free 
particle by taking the rest-mass m equal to zero. The waves (6) then become just 
the light-waves associated with the photon, apart from polarization considerations 
and the fact that they involve an imaginary exponential instead of a sine or cosine. 


40. Wave Packets 


By superposing a number of solutions of the type (6) belonging to different values 
of the momentum p’ lying in the neighbourhood of a given value, one can obtain 
a solution that, at every instant of time, vanishes (approximately) everywhere 
outside a certain finite region. Within this region the waves are approximately of 
a single wave-length, corresponding to the given value of p’. This solution thus 
forms a group of waves or wave packet. The velocity V of such a wave packet is 
not equal to the velocity of the waves, but lies in the same direction and is given 
by the hydrodynamical formula for group velocity 


dv 
~ d(i/d)" 
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With the help of (7) and (8), this becomes 


/ 
= _ = 0 (m?c? + P”)4 = Se 
Thus the group velocity is the same as the velocity of the particle. 

This important result was first obtained by de Broglie. It is capable 
of wide generalizations. If we have any dynamical system describable by 
a Hamiltonian H(q,p), which is an arbitrary function of canonical q’s and p’s, 
then, if it is permissible to treat Planck’s constant h as small so that terms 
involving it as a factor may be neglected, the Schrédinger equation will admit 
of solutions consisting of wave packets whose motions are along the trajectories of 
the classical theory. The proof is as follows. The Schrédinger equation is 


V 











in= (al) = (4 ins.) (q\). (10) 


We express the Schrédinger function (q]) as though it were of the form of 
waves, thus 
(ql) = 9A, 


where A and S are real functions of the q’s, which give the amplitude and phase 
respectively. The effect of the operator —ihO/0q, on (q|) is now 


OS O 
= iS/h SE, ft pl, 11 
(q|) =e (= ing) A (11) 





i 0 
th 


Tr 


and that of the operator ihO/Ot is 


oO isin OS |, O 
ihe (al) =e ( a + tha A. 


If f is any function of the operators —ihO/Ogq, expressible as a power series, 
one finds readily by repeated applications of (11) 


Pe 
r( ing) (al) =¢ (3 ee A. 








Thus (10) becomes, after removal of the factor e’5/", 
Os O Os a 
res amas Cla Pa ca 12 
( Ot ring) (« oq ins”) (12) 


The right-hand side, considered as a function of the (0S/0q — ihO/0q)’s may be 
expanded by Taylor’s theorem as a power series in h, which we are supposing to be 
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a small number. The terms in this expansion are alternately real and imaginary.’ 
If we neglect all except the first two and equate these to the real and imaginary? 
parts of the left-hand side of (12), we obtain 


Os Os 
= ep (4 on (13) 


aud AOR = OH(q, 0S'/0q) OA (14) 
Ot —~ O(0S/0qr) Odr 





Equation (13) is just the Hamilton-Jacobi equation of classical mechanics. 
Thus the phase of the Schrodinger wave function is given by the principal function S 
of the Hamilton-Jacobi theory when one counts h as small. Equation (14) is the one 
that governs the amplitude A of the wave function. It shows that for any solution 
S of (13) the amplitude remains constant along the trajectories given by 

dqr _ OH(q, 05/04) 


dt 5(8S/8q) (15) 





but is otherwise arbitrary. Thus we can take A to vanish everywhere except on 
a certain group of neighbouring trajectories, along each of which it must have 
a constant value. We obtain in this way a solution of the wave equation that 
at any time vanishes everywhere outside a certain small region. There is a limit 
to how small this region may be, imposed by the approximations we have made. 
Our neglect of later terms in the Taylor expansion of the right-hand side of (12) 


is justified only provided 
ped < OP. 
oq oq 

This requires that A shall vary by an appreciable fraction of itself only through 
a range of g in which S' varies by many times h, 7.e. a range of q consisting of many 
wave-lengths of the wave function. Thus our solution of the wave equation that 
vanishes everywhere outside a certain small region is of the nature of a wave packet. 
The motion of this wave packet is given by the trajectories (15), which are, 
when one remembers that 05/0q, is playing the part of p,, just the trajectories of 
classical mechanics. 

For the system consisting of a free particle, a wave packet represents a state 
for which both the position and the momentum have definite numerical values 
to a certain limited degree of accuracy. Such a state is of the kind that usually 
occurs in practice, particularly if the particle has a large mass, since one usually 
knows roughly both the position and the momentum of a particle with which 





tpure’ omitted 
pure’ omitted 
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one is dealing. If Ax is the order of magnitude of the size of the wave packet, 
then, when one resolves the packet into its Fourier components, the wave-lengths 
of the different components will be distributed over a range of order 


AX = 2/Az. 


From (7) this corresponds to a distribution of the momentum of the particle over 
a range of order? 


Ap =h/d\*- Ad = h/Az. 
Thus we have phe ik (16) 


which shows there is a theoretical limit to the accuracy with which both 
the position and momentum may have definite numerical values together. 
The relation (16) is known as Heisenberg’s principle of indeterminacy. It shows 
how, the more accurately the position of a particle is known, the greater 
the indeterminacy in its momentum and vice versa. One would expect a principle 
of this type to hold simply from the quantum condition 


xp — px = 1h. 


It should be understood that (16) holds only in the most favourable case and 
that the indeterminacies may be much greater than is implied by this equation. 
In fact if one takes a wave packet for which (16) holds at one instant of time, 
in course of time this packet will spread and Ap Az will increase. For a discussion 
of this spreading and for a treatment of the motion of wave packets representing 
particles in fields of force, the reader is referred to papers by Earle Hesse Kennard 
and Charles Galton Darwin? 

Heisenberg’s principle of indeterminacy applies also to general dynamical 
systems describable by means of canonical q’s and p’s. We have seen that such 
systems have states represented by wave packets moving in qg-space. Any such 
state is one for which both the q’s and the p’s have numerical values to a certain 
degree of accuracy, the orders of magnitude of the minimum indeterminacy Aq, in 
a co-ordinate qg, and Ap, in the conjugate momentum p, being connected by 


Ap NG =i (17) 





¥*.? replaces*.’ 

tSee Kennard, E.H. Zur Quantenmechanik einfacher Bewegungstypen. Z. Physik 44, 326-352 
(1927). https: / /doi.org/10.1007/BF01391200; and Darwin, C.G. (1927). Free Motion in the Wave 
Mechanics. Proceedings of the Royal Society A: Mathematical, Physical and Engineering 
Sciences, 117(776), 258-293. doi:10.1098/rspa.1927.0179 
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This general relation may be deduced in the same way as (16) from the connexion 
between the size of a wave packet and the indeterminacy in the wave-length of its 
waves, or it may be inferred directly from the quantum condition 


GrPr — PrGr = Oh. 


The states dealt with in classical mechanics, of a system composed of massive 
particles or bodies, are represented by these wave packets and (17) gives the limit 
of accuracy of the classical treatment. 


41. The Harmonic Oscillator in One Dimension 


We shall now consider the problem of the harmonic oscillator in one dimension. 
The Hamiltonian for this system in classical mechanics is* 


H =1/2m: (p? + m?w?q’) (18) 


where m is the mass of the oscillating particle and w is another numerical constant, 
equal to 27 times the frequency. This Hamiltonian can be taken over into 
the quantum theory and must then be supplemented by the quantum condition 


gp — pg =1h (19) 


to give a completely determinate problem. 

The equations of motion are easily verified to be the same as in 
the classical theory. We must now determine the eigenvalues of the Hamiltonian H. 
This question is the same as that dealt with in §29, there being a difference only in 
the numerical constants, on account of the f in (19) and the 2m and m?w” in (18). 
The present q is (h/mw)? times the q of §29 and the present p is (hmw)? times 
the p of §29, which results in the present H being 4hw times the (p? + q?) of §29. 
Thus from the result that the (p?+q’) of §29 has the eigenvalues 1, 3, 5,..., we can 
infer that the present H has the eigenvalues 


1 3 5 
gs. UO. GN ea 2 





These are the possible values for the energy of a harmonic oscillator in 
the quantum theory. 

We shall now obtain the Heisenberg matrices representing p and q. These can 
be obtained readily from equations (34) of §29. Allowing for the change in the 





** replaces ‘.’ 
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numerical constants and remembering that the A of §20 is equal to (2H/hw — 1), 


we find - 
H' |p| H' — hw (4m)? (H' _ dhiw)retot+) 


)= 
(H' — hus|p| 1’) = (4m)*(H! — dftw)2e 9) 

(H'|q|H' — hw) = —i/(2m)*w - (H! — dw) det) 
(H! — fw|q|H’) = i/ (2m) w - (H! — Hw)te e+) 


when the correct time-factors are included. In the classical theory we have, 
when we express p and q as Fourier series, 


(20) 


p= (2mHBeos(ut +7) = mH )ifet9) 4 Hen} 
q = (2H/m)iw! sin(wt + -y) = (H/2m)aw NY tet) + je“) 


This shows up the correspondence between the Fourier components of the classical 
theory and the Heisenberg matrix elements. The classical Fourier components are, 
of course, equal to these matrix elements when one neglects A. 

If the oscillator carries an electric charge ¢, its electric moment will be eq. 
According to Heisenberg’s assumption, given in §38, for the spontaneous emission 
of radiation, the oscillator will then emit only radiation of frequency w/27 since 
all the matrix elements of g vanish except those mentioned in (20). This result 
is the same as in the classical theory. When the oscillator is in a state of energy 
H' = (n+4)hw, or, as we may say, when it is in its n-th quantum state, its rate 
of emission of radiation, according to (50) of §38, will be 


Asoy* ie? 
302 2mw? 


Dhe2 3 
(H’ — thw) = == 





3mc3 (21) 
giving a probability* 2e?w?/3mc? -n per unit time of the oscillator jumping from 
state n to state n — 1. In the state of lowest energy, for which n = 0, there is no 
emission of radiation. 

In the classical treatment of periodic and multiply-periodic dynamical systems 
it is often convenient to make use of action and angle variables. We can introduce 
corresponding variables in the quantum theory. In our present problem of 
the harmonic oscillator we can define the action variable J by 


J = H/w — dh. (22) 


It is a constant of the motion and its eigenvalues are integral multiples of h 
greater than or equal to zero. Thus its matrix representative in the Heisenberg 
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representation is 





00 0 0 O 
0h 0 0 0 
OO 2h lk. “0 
O°.Q. 0), -Bh- 30 
00 0 0 4h 





when the rows and columns are arranged in order of ascending energy-levels. 
To define the angle variable we introduce the two matrices 








oo Cor O&O 
Ccoorocoo 
oro Oo O&O 
Ee OO Cc Oo 
ooo oco oO 
SS OO: > © 
ooo OF 
Coo Cor O&O 
CoorCco 
oro oO O&O 








in which the non-vanishing elements are just to the left and just to the right 
of the principal diagonal respectively, and call the variables that they represent 
at time t=0, e” and e~” respectively. These two matrices are conjugate 
complex, according to the definition of §21, and thus represent conjugate complex 
observables, in agreement with what is implied by the notation of e””’ and e~™”. 
This notation implies further, however, that the two matrices are the reciprocals of 
one another and this is not altogether true. The matrix representing the product 
ee is, in fact, just the unit matrix, but that representing e’’e~™ differs from 
the unit matrix through having zero for its first diagonal element. Thus 


ere = 1. vee a. (23) 


The variables e” & e~”, defined above through their matrix representatives, 
are the best quantum analogues that we can get to the exponentials of 2 and 
—i times the angle variable of the classical theory. They have many properties 
analogous to those of their classical counterparts and their only serious defect is 
that ee is not quite equal to unity. Thus, for example, we obtain at once from 
the matrices the relations 


(24) 


esas =pe 
eM J=(J+he™, 


which are equivalent to the classical relations 


[e™, J] = ie™, le, J] = -ie™™. 
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Equations (24), when compared with equation (28) of Chapter II, are seen to be 
consistent with the view that J and w are conjugate dynamical variables satisfying 
the relation 

wJ — Jw = 2th, 


although actually this relation is meaningless since we cannot define w itself but 
only e+’ Again, the dynamical variable e’’ at an arbitrary time ¢ must be 
represented by a matrix whose elements vary with t according to the Heisenberg 
law e(#'-H't/h Since all the matrix elements vanish except those referring to 
consecutive energy-levels for which H’ — H"” = hw, every matrix element. will vary 
with the time according to the law e““ This corresponds to the fact that in the 
classical theory w increases linearly with t at the rate w. 

The co-ordinate and momentum g and p can be expressed in terms of the action 
and angle variables. The momentum p, for instance, is, according to (20), 
represented by the matrix 








(Amiw)?}}0 1 0 O 0 
Le 0 2 0 0 
0 421° 0 4/3: 6 
0 043. 0 2 
Ges 6? ~ B1g 





with disregard of trivial phase factors, and hence 

p = (bmw)?(Fe™ + e~™ J). 
Similarly gq = (2mw)-?(—-iF2e™ + ie~™ J), 
We see from these equations that p and g, when expressed in terms of the action 
and angle variables, involve them only through the two combinations J2e”” 
and e~J?, Further, all dynamical variables that we may have to deal with 
to obtain any physical result must be functions of p and q and will therefore, 
when expressed in terms of the action and angle variables, involve them only 
through the two quantities Je’ and e~™J?. Now it is easily verified from 
the matrix representatives that these two quantities are respectively equal to 


Jie” = (J +h) 


(25) 


26 
and et yt = (J+ hyde oe 


and that their products in either order are? 
Bene UPA, 
ew Jt. Stel = (J+ hte. (J+AP=I+h. 


1 replaces ‘.’ 
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These results hold in spite of the inequality in (23). They show that when we 
are dealing with dynamical variables of physical importance, which can involve 
the action and angle variables only through the two quantities J2e”” and e~™ J?, 
we may count e” and e~™ as truly reciprocal quantities without getting into error. 
Thus we can freely use the action and angle variables in complete analogy with 
the classical theory without getting incorrect physical results. 

The wave equation for the harmonic oscillator with Hamiltonian (18) is 


O 1 Oo 
p= wey ras mew? 2 ; 
inal) = 5 {Mga meat bad 
The wave functions representing stationary states will be the periodic solutions 


of this equation, for which the operator ihO/Ot is the same as multiplication by 
the energy-level H’. They will thus satisfy 


1 oO? 
H'(q|\) = —< -r?’’— 242 q? 27 
(q\) rat rae (ql) (27) 
The general solution of this equation has been given by Erwin Schrédinger.* We 
shall here obtain some of the solutions representing states of lowest energy for use 
in the next section. 

Equation (27) reduces to 


a gq wnt+1 
{a nF at az \ (q|) = 0, (28) 





where a? is the number h/mw and H’ has been put equal to (n+ 1)hw. Put 


(al) = f(ge? 
Equation (28) now becomes 


af od 4 | Gg 1], cs antl 
dq? dq a? 





94 ff _ 2n 
dq? azdq a? 


or 





p=: 


The solution of this equation, when n is any non-negative integer, is a finite power 
series in g. For 





*Schrédinger, E. (1926). Quantisierung als Eigenwertproblem. II Annalen Der Physik, 
384(6), 489-527. doi:10.1002/andp.19263840602 on page 514 equation 22 
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the solutions are easily verified to be 


f(@ = 1, q; q’ — 4a’, °— 3qa7,.... 


The successive eigenfunctions are thus 


—q? /2a? —¢?2/2q2 
Gi Ser2" (q|1) = ge" ?*, 


(29) 
(q|2) = (@? — 3a’)e“P", (q|3) = (@? — Saa?)e PP"... 


42. The Harmonic Oscillator in Two Dimensions 


Let us now suppose the harmonic oscillator of the preceding section can vibrate also 
in a second direction, at right angles to the first, with the same frequency w/27. 
We shall then have a harmonic oscillator in two dimensions, whose Hamiltonian 
ist 
H =1/2m- (p+ py) + amu? (2? +-y’), (30) 
where x and y are the co-ordinates and p, and p, the conjugate momenta. 
The study of this system is of interest as it provides beautiful examples of 
the superposition of states and also it can be applied to the problem of the 
polarization of a photon. 
The Hamiltonian (30) can be regarded as the sum of the Hamiltonians of two 
separate dynamical systems, namely, the two one-dimensional harmonic oscillators 
with the Hamiltonians?$ 


Hy, = 1/2m- (p?) + $mw? 2? Hy = 1/2m- (pi) + 4mw?y?. (31) 


On account of this fact there is a simple connexion between the eigenfunctions of 
the H of (30), representing stationary states of the whole system, and those of 
the H, and H, of (31), representing stationary states of the component systems. 
Let us first consider the general case of a system whose Hamiltonian H can be 
regarded as the sum of the Hamiltonians H, and H, of two separate dynamical 
systems, 2.€. 

H = A, + Ag, 


where all the observables in H, are different from and commute with all those 
in Hy. We can now choose a complete set of commuting observables defining 
a representation, consisting of some observables q, that occur only in H; and some 
q2 that occur only in Hj. This will result in the representative of H being of 
the form 


(419|H latas) = (a1) Aila1)5(95 — a2) + O(a, — a1) (| A140), (32) 


replaces ‘.’ 
Sround brackets are included in analogy with (30) 





126 VII. ELEMENTARY APPLICATIONS 


if we take the case of continuous q for definiteness. Now let (q’'|H{) and (q5|H3) 
be eigenfunctions of H, and Hp» respectively, belonging to the eigenvalues H} and 
H3, so that 


foiltilat) day (atlet) = HYD, 
[cltalas) def (S|) = HS(ah|H2). 
We shall then have from (32) 
[fk (q.%|Alaiaz) dal dag (a1 | 41) (q5| A) 
= foil File) dat (af HE \CaLEB) + hl) f (bal) af oS LE) 
= Fy (¢,|A1) (4515) + Ho (441) (45) A). 


This shows that the product (q;|H{)(q4|H) is an eigenfunction of H belonging 
to the eigenvalue H} + Hi. The product of eigenfunctions of the Hamiltonians of 
each of the component systems is an eigenfunction of the Hamiltonian of the whole 
system, the corresponding eigenvalue being the sum of those for the components. 
The physical meaning of this result is, of course, that when the component systems 
are in stationary states, the whole system is also in a stationary state, whose energy 
is the sum of those of the components and whose representative eigenfunction is 
the product of those of the components. 

Let us apply this general result to our problem of the two-dimensional 
oscillator. We have already in the preceding section considered the eigenfunctions 
of Hamiltonians of the form of H, and H,. Let (x|n,) and (y|n,) be 
eigenfunctions of H, and H,, labelled by the quantum numbers n, and ny, 
the corresponding energy-levels being H{,=(nz+3)hw and H) = (ny, +3)hw 
respectively. Their product 

(z|n2)(ylry) 
will then be an eigenfunction of the Hamiltonian H of (30), belonging to 
the eigenvalue 
H! = H+ H, = (ne t+ ny + iw. 


Thus the eigenvalues of H are integral multiples of hw greater than zero. Each of 
these eigenvalues (except the lowest one hw) belongs to several eigenfunctions, 
corresponding to the various possible ways of choosing n, and n, to have a given 
integer as sum. There are thus several stationary states with the same energy. 
A system for which this is the case is called degenerate. 

Let us now examine the eigenfunctions of some of the states of low energy, using 
the results (29) for the eigenfunctions for the one-dimensional oscillator. The state 
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of lowest energy hw has the quantum numbers n, = 0, n, = 0 and is represented 
by the eigenfunction 

(x|0)(yl0) =e P (33) 
There is only one state belonging to this energy-level, which is therefore 
non-degenerate. The next lowest energy-level 2hw has two independent states 
belonging to it, corresponding to the two sets of quantum numbers nz = 1, ny = 0 
and nz = 0, ny = 1. The two eigenfunctions are 


SR aaah 


34 
(ql) QD a yee Pe en 





We can take any linear combination of these two eigenfunctions and _ get 
another eigenfunction representing another stationary state belonging to the same 
energy-level 2hw. 

Our two-dimensional harmonic oscillator has circular symmetry about 
the origin in the xy plane. Hence, if we take a new set of rectangular Cartesian 
co-ordinates «* = xcosé + ysin#@, y* = xsin#é — ycos9@, the wave functions in x* 
& y* will be of the same form as those in x & y. The stationary state of energy 
2hw for which the x* component of oscillation is in the one-quantum state and 
the y* component in the zero-quantum state, z.e. for which n,* = 1, n,* = 0, 
will therefore be represented by the eigenfunction 


(#2 *2 2 
pte (ett?) /202 


But this is equal to (x cos 6 + y sin @)e~ @ +9°)/20* (35) 


which is a _ linear combination of the two eigenfunctions (34). 
Thus the one-quantum state of linear oscillation in any direction can be obtained 
by a superposition of the two one-quantum states of linear oscillation in the x and 
y directions respectively. 

The essential differences in the nature of this quantum superposition from 
that of classical superposition for the same dynamical system should be noted. 
In the classical theory if we superpose a state of linear oscillation of given energy 
in the x-direction with one of linear oscillation of the same energy in the y-direction, 
the resulting state will be of double the energy, instead of the same energy as in 
the quantum theory. Again, if this resulting state is one of linear oscillation, it must 
be in a direction at 45° to the original oscillations and cannot be in an arbitrary 
direction as in the quantum theory. 

The example of quantum superposition just discussed is directly applicable 
to the problem of the polarization of a photon. A photon of given frequency 
moving in a given direction may be regarded as a harmonic electromagnetic 
oscillation in a one-quantum state. This oscillation may be resolved into 
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two perpendicular directions, corresponding to two independent states of linear 
polarization of the photon, so it forms a dynamical system formally the same as 
the two-dimensional oscillator investigated above. The wave functions (34) & (35) 
may thus represent states of linear polarization of the photon. We see that the 
state of a photon linearly polarized in an arbitrary direction 6 can be obtained by 
superposition of the states of polarization 0 and 47. The relative weights of these 
two states in the superposition process are given by the squares of the moduli of 
the coefficients of the wave functions (34) in the expression (35) and are thus as 
cos? @ : sin? 6, in agreement with the discussion in Chapter I. 

We can superpose the two states of linear oscillation represented by the two 
eigenfunctions (34) in such a way as to get a state of circular oscillation in either 
direction about the origin, corresponding to a circularly polarized photon. To do 
this we must take the following linear combinations of the eigenfunctions (34), 


(a iyle PIP (g — iy)e @tH)/2H? (36) 


These two new eigenfunctions will represent states of circular symmetry, as is at 
once apparent from the fact that they remain invariant, except for multiplication 
by a numerical factor, when one makes a transformation to the co-ordinates 
x* & y* We can determine the direction of rotation for either of these 
eigenfunctions from a consideration of the angular momentum. We define 
the angular momentum, as in the classical theory, by xp,—ypz. It is represented by 
the operator —ih(xz0/Oy — yO/Ox), which operator, when multiplied into the first 
of the eigenfunctions (36), gives the result 


O O 2 2 2 
: eee —(x?+y7)/2a 
in(a5 ) (e+ inl 


= —thx {' = eee) ec +H°)/20" 4 thy - are ene? +y?)/2a? 





a? a? 
= Ala + iyle WP TH)/2e, 


This operator is thus equivalent to multiplication by h, showing that the first of 
the eigenfunctions (36) represents a state for which the angular momentum has 
the value h. The second must now from symmetry represent a state for which 
the angular momentum has the value —h. It should be noticed that the states of 
linear oscillation represented by the eigenfunctions (34) are not states for which 
the angular momentum has the value zero, as it would in the classical theory, 
but are states for which there is an even chance of its having the value h or —h. 
The state of lowest energy represented by (33) is one for which the angular 
momentum has the value zero. 

We can deal in the same way with the two-quantum states of energy 3hw, 
of which there are three independent ones, corresponding to the three sets of 
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quantum numbers n, = 2, ny = 0; nz = 1, ny = 1; ng = 0, ny = 2. The three 
eigenfunctions are 


(a|2)(y|0) = (a? — fae 49)/20% 
ys 2 2 a2 

(x|1)(y|1) = eye es, (37) 

(x|0)(y|2) = (y? —da2)eW 497/27" 


The two-quantum state of linear oscillation in any direction x* will be represented 
by the eigenfunction 


(a*? — fa2)e7 (7 +7120? — f(r cos + y sin)? — fa? Fe +97) /20" 


which is a linear combination of the three eigenfunctions (37). There are three 
two-quantum states of circular oscillation, represented by the eigenfunctions 


(a + iy)2e @497)/207 {(x + iy)(a@ — iy) — fe" t9)/2e° 
(x — iy)2e7 +97) /20 

It is easily verified that the angular momentum has the values 2h, 0, —2h for these 

three states respectively. 


43. The Spin of the Electron 


In dealing with problems about electrons according to quantum mechanics, 
one finds one does not get agreement with experiment if one assumes the electrons 
to be simply point charges repelling one another according to the Coulomb law 
of force. It is necessary to make the assumption that each electron is spinning and 
so has an internal angular momentum, and also that it has a magnetic moment. 
To make the theory agree with experiment we must assume that the eigenvalues 
of the Cartesian component of the spin angular momentum in any direction are $h 
and —gh, and that the magnetic moment of the electron (with its sign reversed) 
always lies in the same direction as the spin angular momentum and has as 
eigenvalues for its component in any direction the values' eh/2mc and —eh/2me. 
Thus if an electron in a certain state of spin has a spin angular momentum of 
sh in a particular direction, it will have a magnetic moment —eh/2mc in this 
same direction. A theoretical reason for these assumptions will be provided by 
the relativity theory of the electron given in Chapter XIII. For the present we shall 
merely take them as empirical results and investigate their principal consequences. 





tThe e here, denoting minus the charge on an electron, is, of course, to be distinguished from 
the e denoting the base of exponentials. 
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Let Sz, Sy, 5, be the three Cartesian components of the spin angular momentum. 
We require quantum conditions for these three observables, to replace the classical 
conditions that they all commute. In §44 the quantum conditions will be obtained 
for the three components of the angular momentum about a point of a single 
particle and also of a set of particles. It will be found that these quantum conditions 
are of the same form for a single particle as for a set of particles, which suggests 
that this form, namely equations (8) of §44, is the general one governing any 
angular momentum, even the angular momentum of a spinning body. This gives 
us the quantum conditions 


[sy, 8, = 8x, [sz, 82] = Sy; [Sx Sy = 82, (38) 
for Sz, Sy, Sz, Which may be written alternatively 
SySz — SzSy = isa, S28 — 8,8z = thsy, SpSy — Sy8_ = ths, (39) 
and combined in the single vector equation 
sxs=ths. 


There will be further algebraic relations satisfied by s,, 5,, sz, owing to the fact 
that each of these observables has only two eigenvalues 4h and —sh. Thus its 
square will have only the one eigenvalue 4h? and may therefore be put equal to 
the number 4h?, i.e. 


ee ee sie (40) 


It is convenient to write 
Lata aoe cen 
Sz = ghog, Sy = hoy, Sz = sho, 


introducing the three new observables o,, a, and o,. The magnetic moment of 
the electron then has the components 


eh eh - eh 


io ae 
2mc ”’ 


so that these three observables o,, 0, & o, are sufficient to describe completely 
the spin of the electron. They form the components of a vector o. 
From (39) we find 


CCGA j= DUG gs, C03 00x = 2105; C09 = 0404 = 21045 (41) 


and from (40) 
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corresponding to the fact that each o has just the two eigenvalues 1 and —1. 
From the first of equations (41) 
WG2G 0 Oa) = (210) 0) FO 2105) 

= (OyOz — Oz0y)0y + Fy(Tyoz — Fz0y) 

= —0,07 + O10: = 0, 
so that C2 0y = =O 0x: 
Two observables like these which satisfy the commutative law of multiplication 
except for a minus sign are said to anticommute. Thus o, anticommutes with o, 
and from symmetry any of the three observables oz, o,, o, anticommutes with 
any other. We now obtain from(41) 


OyOz =10z = —Oz0y, 
0,07 = Wy = —0z0z, (42) 
Oz0y = 10, = —Oy0n, 

Oz0yOz = 1. 


We must verify that the relations (42) are invariant under a rotation of 
axes, in order to show that our assumptions about the spin are permissible. 
Let the components of o referred to a new set of mutually perpendicular axes be 





01 =hozg +m y+ n102, 





02 = 190, Mgdy + Nz, 











03 = 130, + M30, + N30;. 
From (42) we now obtain 
o? = (hog + may + m0)" 
=Far + mion + nic? 
+ 1m (Op0y + yor) + MN (Gyo, + O20y) + M41 (720, + 0x02) 
=P+mi+ni=1. 


Again, 


0203 = (ln0y + Mgdy + N90z)(I3a7¢ + M3dy + N37z) 
2 2 2 
= Iglsa, + MIM3Z0, + NQN3Z0, + lomM30z0y + MalzoyTx + MgN3Z0yOz 


+ N2QM30,0y + Ngl30,07 + lyn3070; 





= lols + M723 + nNgnN3 + i(lgms3 — Mal3)oz + i(meng3 = NQM3)Ox ne i(Nal3 = lon3)oy 
= i(lion + MjiOy + N02) = 101 
Thus 01, 02, 73 satisfy relations of the same form as (42). 


We shall now obtain matrices to represent the spin observables o2, oy, G2. 
These matrices need have only two rows and columns, since the observables they 
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represent have each only two eigenvalues. If we take a representation in which a, 
is diagonal, then a, will be represented by 


1 0 

0 -1/ 
a, a2 
a3 ag , 


Since g, is areal observable this matrix must be Hermitian, so that a, and a4 must 
be real and az and ag conjugate complex numbers. The equation 0,0, = —0,0, 


now gives us 
ay a2 = ay —ag 
—a3 —a4 a3. —a4 , 


so that a; = a4 = 0. Hence a, is represented by a matrix of the form 


0 ag 
a3 O07; 


The equation a? = 1 now shows that a2a3 = 1. Thus a2 and a3, being conjugate 
complex numbers, must be of the form e’* and e~** respectively, where a is a real 
number, so that o, is represented by a matrix of the form 


0 ela 
Bt OF. 


Similarly it may be shown that o, is also represented by a matrix of this form. 
By suitably choosing the phases in the representation, which is not completely 
determined by the condition that o, shall be diagonal, we can arrange that o, 
shall be represented by the matrix 
ian | 
(i 0} 


The representative of o, is then determined by the equation a, = io,0,. We thus 
obtain finally the three matrices 


(ro) Go} 0 


to represent o,, 0, and o, respectively, which matrices satisfy all the algebraic 
relations (42). The component of the spin vector o in an arbitrary direction 
specified by the direction cosines 1], m & n is represented by 


( on eu) i) 


Let o, be represented by 
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In our representation with oa, diagonal, a state of spin will be represented 
by a function (o/|) of the variable o{, whose domain consists of only the two 
points +1 & —1. This function is thus a pair of numbers. The state for which 
o~ has the value unity will be represented by the function, fa(o1) say, consisting of 
the pair of numbers 1 & O and that for which it has the value —1 by the function, 
fa(o.) say, consisting of the pair 0 & 1. Any function of the variable o/, i.e. any 
pair of numbers, can be expressed as a linear combination of these two. Thus any 
state of spin can be obtained by superposition of the two states for which oz equals 
+1 and —1 respectively. For example, the state for which the component of o in 
the direction 1, m, n, represented by (43), has the value 1 is represented by the pair 


of numbers a, 6 which satisfy 


na+ (l—im)b =a, 

(l+im)a—nb=b. 
This gives a_l-im_iltn 
b l-n Il+im 





This state can be regarded as a superposition of the two states for which a, equals 
+1 and —1, the relative weights in the superposition process being as 


jal? : |b]? = |L—im|*? : Q—n)? =14n:1—n. 


For the complete description of an electron we require the spin observables o 
together with the Cartesian co-ordinates x, y, z and momenta p,;, py, pz. The spin 
observables are assumed to commute with these co-ordinates and momenta. 
Thus a complete set of commuting observables for a system consisting of a single 
electron will be xz, y, z, ao. In a representation in which these are diagonal, 
the representative of any state will be a function of four variables x’, y’, 2’, o4. 
Since o/, has a domain consisting of only two points, this function of four variables 
is the same as two functions of three variables, namely the two functions 


(a'y'z'|)4 = es Ue ae +1)), Gy 2)|)= = @ y', ie Sly: 


Thus the presence of the spin may be considered either as introducing a new variable 
into the wave function representing a state or as giving this wave function two 
components. 


VII. MOTION IN A CENTRAL 
FIELD OF FORCE 


44. Properties of the Angular Momentum 


AN ATOM consists of a massive positively charged nucleus together with a number 
of electrons moving round it, under the influence of the attractive force of 
the nucleus and their own mutual repulsions. An exact treatment of this dynamical 
system would be a very difficult mathematical problem. One can, however, 
gain some insight into the main features of the system by making the rough 
approximation of regarding each electron as moving independently in a certain 
central field of force, namely that of the nucleus, assumed fixed, together with 
some kind of average of the forces due to the other electrons. Thus our present 
problem of the motion of a particle in a central field of force forms a corner-stone 
in the theory of the atom. 

Let the Cartesian co-ordinates of the particle, referred to a system of axes 
with the centre of force as origin, be x, y, z and the corresponding components of 
momentum p,;, py, pz. They satisfy the quantum conditions 


i y| = 0, [x, Dal = 1, [a Py! =e 0, 
&c. The Hamiltonian, with neglect of relativity mechanics, will be of the form* 
H = 1/2m- (py +p, +2) + V, (1) 


where V, the potential energy, is a function only of (x? + y? + 27). 
We now introduce the components of angular momentum defined, as in 
the classical theory, by 


Me = YPz — ZPy, My = <Pr — LPz, Mz = LPy — YPx, (2) 


or by the vector equation 
m= x X p. 





** replaces *.’ 
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From these equations we obtain at once the identity 
ME + myy +m,z = 0. (3) 


We must now evaluate the P.B’s of the angular momentum components with 
the observables x, p,, &c., and with each other. This we can do most conveniently 
with the help of the laws (7) and (8) of §32, thus 


[m., x] = [rpy — YPx; x] == = lps x] = Y, (4) 
[my] = [zpy — ype, y| = [py, y] = —2, 
[mz, z] a [zpy — Y¥Px, Z| = 0, (5) 
and similarly My, Pa] = Py, [eDdy]| = — Pe: (6) 
[mz, De] = 0, (7) 
with corresponding relations for m, and m,. Again 
[my, mz] = [zPx — tDpz, Mz] - Z[px,Mz] _ Ee mz|p- 
= —Z2py + YPz = Mz, 
(8) 
ite, i | Te, bag ity | = Te. 


These results are all the same as in the classical theory. The sign in the results (4), 
(6), and (8) may easily be remembered from the rule that the + sign occurs when 
the three observables, consisting of the two in the P.B. on the left-hand side and 
the one forming the result on the right, are in the cyclic order (xyz) and the—sign 
occurs otherwise. 

From (4) and (5) we obtain 


[m,, 07? +y? + 27] = a[m,z, 2] + [m,,2]2 + ylm.,y] + [mz ly 
=ryt+yr—yxr—xry =0. (9) 
Similarly from (6) and (7) we find 
[m.,p; + py + pz] = 0. (10) 


Thus m, commutes with (a7 + y? + 27) and with (p2 +p; + pz). It therefore 
commutes with the Hamiltonian H which, according to (1), is a function of these 
two observables only. Similarly m, and m, commute with H. Thus the angular 
momentum is a constant of the motion, as in the classical theory. 

Equations (8) may be put in the vector form 


m xX m= ihm. (11) 


If we have several particles with angular momenta m ;, mg,..., each of them will 
satisfy (11), thus 
m, X m, = thm,. 
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Further, any one of these angular momenta will commute? with any other, so that 
m, Xm, +m, X m, = 0, (fs) 


Hence if M = 5°. m, is the total angular momentum, 


M xM=) > m, xm, = 5>m, x m,+ 5 (m, x m, +m, x m,) 


r,s r<s 


=i) my = ihM. 


This result is of the same form as (11), so that the components of the total 
angular momentum M of any number of particles satisfy the same commutability 
relations as those of the angular momentum of a single particle. Thus (11) or (8) 
may be regarded as the general commutability relations satisfied by any angular 
momentum. They certainly hold when the angular momentum is that of a number 
of particles, and may be assumed to hold also for the angular momentum of 
a spinning body, as was done in §43 for the spinning electron. 
We introduce the observable k defined as the positive square root 


k = (m2 +m? +m? + $h?)2 (12) 


Equations (8) show that our observables m,, m,, mz, if measured in units which 
make h = 1, satisfy just the same conditions as the a, 3, y of §30, the present k 
corresponding to the k of §30. Thus we can apply the results of §30 directly to 
our present observables. We obtain in this way that k commutes with mz, my & 
m, and that its eigenvalues are integral or half odd integral multiples of h greater 
than zero. Also for any eigenvalue k’ of k, the possible eigenvalues of m,, m, or 
Mz are 


Cash Bashy BSH ehse. SRA, 


and are thus half odd integral or integral according as k’ is integral or half odd 
integral. However, by using the further condition that m,, m, & mz, are of 
the form (2) we can show that their eigenvalues must be integral and thus that 
those of & must be half odd integral. We have, in fact, that m, is represented 
by the operator —ih(x0/Oy — yO/0x), which, if one makes the transformation 
x = pcosd¢, y= psing to the cylindrical variables p, ¢, becomes the operator 
—id/0. The general eigenfunction of this operator is of the form f(p)e””:%/", 
m', being the eigenvalue and f(p) being an arbitrary function of p. Now it is 
implied throughout our theory that an eigenfunction is a single-valued function 
of its variables and hence m‘, must be an integral multiple of h. Similarly it may 





[But the vector product ‘anticommutes’ surely?] 
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be shown that m, and m, have only integral eigenvalues. Thus the eigenvalues of 
the components of angular momentum of a particle moving in an orbit must be 
integral multiples of h, although those of the components of angular momentum in 
the general case, which satisfy (8) but need not be of the form (2), may be either 
integral or half odd integral. Those assumed in §43 for the components of spin 
angular momentum of an electron were half odd integral. 

The components of angular momentum in different directions do not commute 
with each other, so that one cannot in general assign numerical values to them 
simultaneously. One can at most give a numerical value to the component 
in one particular direction. The state of the system will then be one which, 
in the language of Niels Bohr’s theory, is spacially quantized in that direction. 
There is, however, one special case in which one can assign numerical values to 
all the components simultaneously, namely, one can give them all the value zero, 
since this will not contradict the commutability relations (8). The resulting state 
of zero angular momentum, with k = 4h, is then one that is spacially quantized 
simultaneously in all directions. 


A5. Transition to Polar Co-ordinates 


For further discussion of the problem of motion in a central field of force it is 
convenient to introduce polar observables. We introduce first the radius r, defined 
as the positive square root r=(¢+y?+ 2h 


If we evaluate its P.B’s with p,, py and p,, we obtain, with the help of formula (16) 
of Chapter VI, 
Or x Zz 
[r, De] es Ox i ee [r, Py| 2) [r, Dz| — ee 
the same as in the classical theory. We could alternatively have evaluated these 
P.B’s by the method given in §39 for |x, H]. 


We now introduce the observable p, defined by 


31< 


Dr =1'(XDe + YPy + Zpz — ih). (13) 
Its P.B. with r is given by? 
rit; p.| =r rpe| = eee yey? 2p.) 


= z[r, Pe| + ylr, Py| = Pale Dz 
=ax-a2/rty-y/rt+z-2/r=r. 





Hence inp) =1 
or PDP = Mh; 





1? replaces ‘.’ 
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so that p, is canonically conjugate r. Now the eigenvalues of r, from its definition 
as a positive square root, must be all positive or zero, so that we have obtained 
a contradiction to the result, proved at the end of §19, that an observable can have 
a canonical conjugate only if its eigenvalues include all numbers from —oo to oo. 
This inconsistency arises from the fact that the observable p, defined by (13) does 
not strictly exist, since r has the eigenvalue zero so that r~' does not strictly exist. 
In spite of this defect the observable p, is a useful one for the study of motion 
in a central field of force. Our equations, which will often involve p, and will 
sometimes involve r~! in other ways than through p,, will be inaccurate, but only 
in so far as they apply to the one point r = 0, and this is too small a region of 
space to invalidate physical conclusions obtained from them. 

The observable p,. defined by (13) is a real one, since its conjugate complex 5, 
is given by 
Dp? = Pot + pyy + pez + th 

= LPz YPy ZPz — 2ih 

= rp, — ih = pyr, 














so that 0, = Dre 

We can easily verify that our two new observables r and p, commute with 
the angular momentum. Equation (9) shows us that m, commutes with r?. It must 
therefore commute also with r, since r is defined as a square-root function so that 
everything that commutes with r? commutes also with r. Again, for p, we have 


TDs mz] = [rpr, mz] = [zpx + YPy; mz, 
= —YPx — Lpy + Lpy + yPs = 9. 
Thus r and p, commute with m,, and hence also with m, and my, and with k. 
We can now express the Hamiltonian in terms of our radial observables r 


and p, and also k. We have, if oss denotes a sum over cyclic permutations 
of the suffixes x, y, z, 


k? — 4h = S 0m? = 5 (ap, — ype)? 


LYz LYz 
= S_(xpyxpy + yPeYPx — LPyPx — YPeEPy) 
LYz 


=) (2p, + yp; — epePyy — YPyPet + 2p, — LPxpyt — Zihape) 


= (x+y? +27) (pe + py + pt) — (ape + yy + 2Pz) (Dot + pyy + pzz + 26h) 
= r?(p2 + py + pt) — (rp, + ih)rp, 
= 1r?(p, +p, + pz) — r?pe. 
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Hence H= 





1 2 k? — 4th? 

Sime eT a me oe 
This form for H is such that & commutes not only with H, as is necessary since k 

is a constant of the motion, but also with every observable occurring in H, namely 

both r and p,. Thus in dealing with the Hamiltonian in this form we can treat k 

as a number. The permissible numbers we can take for k are its eigenvalues and 

are thus positive half odd integral multiples of h. If we write down the Schrédinger 


equation for the stationary states, it will now read 


foe (tg +S) ty = are, as) 


the single variable r in the wave function (r|) being sufficient when k is 
counted as a number. Any value of the parameter H’ for which this equation, 
with a permissible value for k, has a solution (satisfying the boundary conditions 
to be discussed later) is a possible energy-level of the system. The energy-levels 
(except those for which k = 4h) are all degenerate and belong each to several 
independent stationary states, corresponding to the various possible eigenvalues 
of a Cartesian component of the angular momentum. The number of these states, 
for any value of k, is the odd number 2k/h. 

If we write down the Schrédinger equation in the original Cartesian co-ordinates 
x, Y, Z, we shall have 





{av + vi (xyz|) = H'(xyz)), (16) 


where V? is the Laplacian operator 0?/0x? + 0?/dy? + 0?/0z?. This becomes, 
on transforming to polar co-ordinates r, 0, @, 


h2 (02 2A 1 O 1 @ . 
er (a Or r2sin0d0 | aargipt) | v} (794|) = H"(ré9\). 





The solutions of this equation are of the form 


(rO9|) = x(7)Sn(9, ¢) 
where S,, is a spherical harmonic of order n satisfying 


Ge 1 @& 


n o) =a 7. 1 n 5 - 
sind 06 sin 0-5 358) S,(9, 6) n(n + 1)S,,(0, ¢) 





n being an integer, and y(r) is a function of r only, satisfying 


f h? (5 | —— | Vb xr) = Han) a7) 


2m r 
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This equation, like (15), is such that the values of H’ for which it has a solution 
are the energy-levels of the system. 

The equivalence of equations (15) and (17) may be seen from the fact that 
if in (15) we put (r|) = rx(r) we obtain just equation (17) with n = k/h — 4. 
The fact that the two eigenfunctions (r|) and x(r) are not identical but differ by 
this factor r is due to their different physical interpretations. A solution (r|) of (15) 
represents a state for which the probability of the particle lying in the spherical 
shell between r and r + dr is proportional to |y(r)|?r2dr. On the other hand, 
a solution (xyz|) of (16) represents a state for which the probability of the particle 
lying in a small volume dadydz is |(xyz|)|° dadydz or |x(r)Sn(0, ¢)|? dadydz, 
so that the probability of its lying in the spherical shell between r and r + dr is 
proportional to |x(r)|’?r2dr. Thus the physical interpretations require (r]) to be 
proportional to rx(r). 

It should be noticed that not every solution of (17), when multiplied by 
the appropriate spherical harmonic, will give a solution of (16), as it may fail 
to satisfy (16) at the origin. One can see most clearly how this comes about 
by considering the special case for which the potential V vanishes, giving us 
the problem of the free particle. If we further take H’ = 0, equation (16) reduces to 


V?(ryz|) =0 (18) 


and equation (17) to 





(on | tr 7 et : a - 


Now a solution of (19) for n = 0 is x(r) = 1/r, but this solution multiplied by 
the appropriate spherical harmonic Sp = 1 does not satisfy (18), since, although 
V?(1/r) vanishes for any finite value of r, its integral through any volume about 
the origin is 47, and hence 


V?(1/r) = 416(x)d(y)d(z). 


Thus the solution x(r) =1/r of (19) does not represent a stationary state of 
the system. Again the solution y(r) = 1/r? of (19) for n = 1, when multiplied 
by the spherical harmonic S; = cos6, gives a wave function (ryz|), the integral 
of the square of whose modulus over any volume, however small, that contains 
the origin is infinite. This wave function must represent a state for which 
the particle is certainly at the origin and this cannot be a stationary state of 
zero energy for the problem of the free particle. Similarly for arbitrary n in 
equation (19), of the two solutions y(r) = r" and y(r) = r~"~|, the second will 
not give the representative of a stationary state of the system. 
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It thus appears that equation (17) is not adequate to replace equation (16) 
as the necessary and sufficient condition for the representative of a stationary 
state. Equation (17) must be supplemented by a suitable boundary condition at 
the point r = 0. Any solution y(r) of (17) for which the integral [,r?|x(r)|? dr 
is not convergent must certainly be rejected, and also some for which this integral 
is convergent, namely those which, when operated on by V2, give an infinite result 
involving the 6 function at the origin. These conditions show that only those 
solutions are to be allowed which, if they tend to infinity as r — 0, do so more 
slowly than 1/r. The corresponding boundary condition for the function (r|) of 
equation (15) is that it shall tend to zero as r > 0. 

There are also boundary conditions for the eigenfunction at r — oo.' If we are 
interested only in ‘closed’ states, z.e. states for which the particle does not go off 
to infinity, we must restrict the integral [°|(r|)|’ dr or [*r?|x(r)|? dr to be 
convergent. These closed states, however, are not the only ones that are physically 
permissible, as we can also have states in which the particle arrives from infinity, 
is scattered by the central field of force, and goes off to infinity again. For these 
states the wave function (xyz|) may remain finite as r + oo. Such states will 
be dealt with in Chapter X under the heading of collision problems. In any case 
the eigenfunction (xyz|) must not tend to infinity as r — oo, or it will represent 
a state that has no physical meaning. 


46. Energy-levels of the Hydrogen Atom 


The above analysis may be applied to the problem of the hydrogen atom with 
neglect of the relativity variation of mass with velocity and the spin of the electron. 
The potential energy V is now —e?/r, so that equation (15) becomes 


dr? r 








d= k?-4 me?1 2mH' 
{ 34 th ey =- er. (20 
when written in terms of a new observable k, equal to h~' times the previous k. 
A thorough investigation of this equation has been given by Erwin Schrédinger? 
We shall here obtain its eigenvalues H’ from a consideration of its eigenfunctions 
expressed in the form of power series. 
It is convenient to put 


(rl) = frye’ (21) 


In the question about the physical existence of infinity the answer has been avoided by 
allowing it to be a limit to which the position would tend. ‘—’ replaces ‘=’ 

'Schrédinger, E. (1926). Quantisierung als Eigenwertproblem. Annalen Der Physik, 384(4), 
361-376. doi:10.1002/andp.19263840404 
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introducing the new function f(r), where a is one or other of the square roots 
a= +4/—f?/2mH". (22) 


Equation (20) now becomes 
@ 2d k?-4 Wwe? 1 
iz adr ore oR *} A (22) 


We look for a solution of this equation in the form of a power series 
f(r) = do esr’, (24) 


in which consecutive values for s differ by unity although these values themselves 
need not be integers. On substituting (24) in (23) we obtain* 


ye {s(s —1)r*-? — Qs/a-r?! — (kh? — Br? ? + Qme? /h? - log = 





which gives, on equating to zero the coefficient of r*~?, the following relation 
between successive coefficients c., 


c.[s(s — 1) — (k? — 4)] = cs_1[2(s — 1)/a — 2me?/h’]. (25) 


We saw in the preceding section that only those eigenfunctions (r|) are allowed 
that tend to zero with r and hence from (21) f(r) must tend to zero with r. 
The series (24) must therefore terminate on the side of small s and the minimum 
value of s must be greater than zero. Now the only possible minimum values of 
s are those that make the coefficient of c, in (25) vanish, 7.e. k + 3 and —k +4, 
and the second of these is negative or zero. Thus the minimum value of s must 
be k+4. Since k is always half an odd integer, the values of s will all be integers. 
The series (24) will in general extend to infinity on the side of large s. For large 
values of s the ratio of successive terms is 
Cs 2r 


Cs—1 sa 


according to (25). Thus the series (24) will always converge, as the ratios of 
the higher terms to one another are the same as for the series 

1 fp 

—{(—}], 26 
d. s} ( a ) (28) 
which converges to e?"/% 

We must now examine how our solution (r|) behaves for large values of r. 
We must distinguish between the two cases of H’ positive and H’ negative. 
For H’ negative, a given by (22) will be real. Suppose we take the positive value 
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for a. Then as r + oo the sum of the series (20) will tend to infinity according to 
the same law as the sum of the series (26), i.e. the law e?"/“ Thus from (21) (r]) 
will tend to infinity according to the law e’/* and will not represent a physically 
possible state. There is therefore in general no permissible solution of (20) for 
negative values of H’ An exception arises, however, whenever the series (24) 
terminates on the side of large s, in which case the boundary conditions are all 
satisfied. The condition for this termination of the series is that the coefficient of 
Cs—1 in (25) shall vanish for some value of the suffix s—1 not less than its minimum 
value k +4, which is the same as the condition that 


sme 
i 
for some integer s not less than k+4. With the help of (22) this condition becomes 


F mes 


1s 35252? (27) 
and is thus a condition for the energy-level H’. Since s may be any positive 
integer, the formula (27) gives a discrete set of negative energy-levels for 
the hydrogen atom. These are in agreement with experiment. Each of them 
(except the lowest one s = 1) is degenerate, as it may occur with various possible 
values for k, namely, any positive half odd integer less than s. This degeneracy 
is in addition to that mentioned in the preceding section arising from the various 
possible values for a component of angular momentum, which degeneracy occurs 
with any central field of force. The k& degeneracy occurs only with an inverse 
square law of force and even then is removed when one takes relativity mechanics 
into account, as will be found in Chapter XIII. The solution of (20) when 
H' satisfies (27) tends to zero exponentially as r— oo and thus represents 
a closed state, corresponding to an elliptic orbit in Bohr’s theory. 

For any positive values of H’, a given by (22) will be imaginary.? The series (24), 
which is roughly the same as the series (26), will now have a sum that remains 
finite as r— co. Thus (r|) given by (21) will now remain finite as r — oo 
and will therefore be a permissible solution of (20), since it will correspond to 
an eigenfunction (xyz|) that tends to zero according to the law 1/r as r > oo. 
Hence in addition to the discrete set of negative energy-levels (27), all positive 
energy-levels are allowed. The states of positive energy are not closed, since their 
representatives (r|) do not make the integral [~ |(r|)|? dr converge. These states 
correspond to the hyperbolic orbits of Bohr’s theory. 
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A7. Selection Rules 


When D, the total electric displacement of a system, is represented in a Heisenberg 
representation, it often happens that a great many of its matrix elements, 
(a’|D]a”) say, vanish. In fact they may all vanish except those for which the a’’s 
and as are connected in a certain way. When this is the case, according 
to Werner Heisenberg’s interpretation of the matrix elements, a transition of 
the system with emission of radiation can take place only between two stationary 
states whose labels a’ and a” are connected in this way. There is then, 
as we say, a selection rule for the a’s, only certain selected transitions being 
allowed. In general we must consider separately the different Cartesian components 
D,, Dy, Dz of D and obtain for each of them the condition that its matrix element 
(a’|D|a") shall not vanish. We shall then often find that for those transitions 
a’ — a” which can take place, i.e. for which the vector (a’|D|a”) does not vanish, 
some of the Cartesian components (a’|D,|a"), (a’|D,|a”), (a’|D_;a") do vanish. 
There will then be conditions on the direction of emission and state of polarization 
of the emitted radiation, which conditions, according to Heisenberg’s assumption, 
will be the same as the classical ones for the radiation emitted by an electric dipole 
whose magnitude and direction are given by the vector 


(a’|Da”) + (a"|Do’). 


There is a general method for obtaining all selection rules, which is as follows. 
Let D be one of the Cartesian components of D. We must obtain an algebraic 
equation connecting D and the a’s which does not involve any observables other 
than D and the a’s and which is linear in D. Such an equation will be of the form 


Ss" trDo = 0, (28) 


where the f,’s and g,’s are functions of the a’s only. When this equation is expressed 
in terms of representatives, it gives us 


d. Fl (a’) (a’|D\a")g (al) -_ 0, 
or (a’|D\a”) Dd, ble!) Ga -=10; 
which shows that (a’|D|a”) = 0 unless 


dbo’) g(a”) = 0. (29) 
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This last equation, giving the connexion which must exist between a’ and a” 
in order that (a’|D|a”) may not vanish, constitutes the selection rule, so far as 
the component D of D is concerned. 

We shall now obtain the selection rules for m, and k for an electron moving 
in a central field of force. The components of electric displacement are here 
proportional to the Cartesian co-ordinates x, y, z. Taking first m,, we have that 
m, commutes with z, or that 


M,Z — zm, = 0. 
This is an equation of the required type (28), giving us the selection rule 
m’, —m =0. 
for the z-component of the displacement. Again, from equations (8) we have 


[mz, [m.z, 2] = [mz,y] = TL 

or mer —2m,2m, + am — hr =0, 
which is also of the type (28) and gives us the selection rule 

m2 — mim! + mi? —h? =0 
or (mi, — mi — h)(m!, —m! +h) =0 
for the x-component of the displacement. The selection rule for the y-component 
is the same. Thus our selection rules for m, are that for the emission of radiation 
with a polarization corresponding to an electric dipole in the z-direction, m', cannot 
change, while for that corresponding to an electric dipole in the x-direction or 
y-direction, m', must change by +h. 

We can determine more accurately the state of polarization of the radiation 
emitted with a transition in which m/, changes by +h, by considering the condition 
for the non-vanishing of matrix elements of x + iy and x — iy. We have 

[mz, 2 + iy] = y —ixz = —i(a + iy) 
or m,(xz + ty) — (a +iy)(m, + h) = 0, 


which is again of the type (28). It gives 
m,—my—-h=0 
as the condition that (m/,|a + iy|m%) shall not vanish. Similarly 


m,—mi+h=0 
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is the condition that (m‘,|~ — iy|m”) shall not vanish. Hence 
(m:|x — iy|m, — h) =0 
or (mi,|a|m, — h) = i(m:|ylm, — h) = (a + id)e™ 


say, a, b and w being real, and similarly 

(m!, — hlx|m,) = —i(m, — Aly|m,) = (a — tbe 
Thus the vector (m/,|D|m!, — h) + (m!, — h|D|m‘), which determines the state of 
polarization of the radiation emitted with transitions for which m= m‘, — h, has 
the following three components 
(m/,|2|m!, — h) + (mi), — Alx|m‘) 
= (a+ ibje™ + (a — ib)e™* = 2acoswt — 2bsinwt, 
(m|y|m, — h) + (mz — hly|m,) (30) 
= —i(a + ibe + i(a — ib)e™ = 2asinwt + 2bcos wt, 




















(mi/,|z|m!, — h) + (mi), — h\z|m‘,) = 0. 
From the form of these components we see that radiation emitted in the z-direction 
will be circularly polarized, that emitted in any direction in the xy plane will be 
linearly polarized in this plane, and that emitted in intermediate directions will be 
elliptically polarized. The direction of circular polarization for radiation emitted 
in the z-direction will depend on whether w is positive or negative, and this will 
depend on which of the two states m/,, or m! = m!, — h has the greater energy. 
We shall now determine the selection rule for k. We have 


[k*, 2] = [mZ, 2] + [mé, 2] 
= —-YMz — Mey + TMy + Myx 
=2(mye — may bth) 


= 2(m,z — ymz) = 2(zm,y — Mzy) 


Similarly [k?, x] = 2(ym, — myz) 
and [ky] = 2(mzz — m,). 
Hence 


k? [k2, z]| = 2[k2. m,x — may + thz 
y 
= 2m,[k*, 2] — 2m,[k*, y] + 2ih[k?, z] 
= 4m,(ym, — myz) — 4m,z(mez — emz) + 2(k?z — zk?) 
2 


= A(m,£ + myy + m,z)m, — 4(m2 + ms + m2)z + 2(k?2 — zk?). 





The first term here vanishes, from (3), leaving us with 
[k2, [k?, z]] = —4(k? — 4h?)z = 2(k?2 — zk?) 
= —2(k*z + zk?) + hz, 
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which gives k*z — 2k°zk? + zk* — 2h? (k?z + zk*) + fitz = 0. (31) 
Similar equations hold for z and y. These equations are of the required type (28), 
and give us the selection rule 

ki? — ok? ki’? k* — on? (ki? +b) + At = 0 
or (ki +k" + h)(k’ +k" — f)(k' — k" + h)(k’ — kk” — hh) =0. 
A transition can take place between two states k’ and k” only if one of these four 
factors vanishes. 

Now the first of the factors, (k’+k”+h), can never vanish since the eigenvalues 
of k are all positive. The second, (k’ + k” — h), can vanish only if k’ = 4h 
and k” = 4h. But transitions between two states with these values for k cannot 
occur on account of the selection rule for m,, as may be seen from the following 
argument. If two states (labelled respectively with a single prime and a double 
prime) are such that k’ = $h and k” = 4h, then, according to the discussion at the 
end of §44, each Cartesian component of the angular momentum must vanish for 


each of them, i.e. m/, = m, = mi, = 0 and mf = mj = mz = 0. The selection 


rule for m, now shows that the matrix elements of x and y referring to the two 
states must vanish, as the value of m, does not change during the transition, 
and the similar selection rule for mz or m, shows that the matrix element of z also 
vanishes. Thus transitions between the two states cannot occur. Our selection 


rule for k now reduces to 
(k’ —k" +h)(k’ —k” —h) =0, 
showing that k must change by +h. This selection rule may be written 
ki? — ok'k" +k’? — = 0, 


and since this is the condition that a matrix element (k’|z|k’”) shall not vanish, 
we get the equation 

















k?z — 2kzk + zk? — fiz =0 
or [k, [k, z]] = —z, (32) 
a result which could not easily be obtained in a more direct way. 


48. The Zeeman Effect for the Hydrogen Atom 


We shall now consider the system of a hydrogen atom in a uniform magnetic field. 
The Hamiltonian (1) with V = —e?/r, which describes the hydrogen atom in no 
external field, gets modified by the magnetic field, the modification, according to 
classical mechanics, consisting in the replacement of the components of momentum, 
Pes Py» Pz, byt py + e/c- Az, Py + e/c- Ay, pz + e/c- Az, where A,, Ay, Az are 





1? replaces ‘.’ 


148 VII. MOTION IN A CENTRAL FIELD OF FORCE 


the components of the vector potential describing the field. For a uniform field of 
magnitude # in the direction of the z-axis we may take A, = —3.7#@y, Ay = 342, 
A, =0. The classical Hamiltonian will then be 


2 2 2 
H= = { (0 — 3 Hy) a (vy + 3-32) rot} a 
2m Cc Cc r 
This classical Hamiltonian may be taken over into the quantum theory if we add 
on to it a term giving the effect of the spin of the electron. The electron has 
a magnetic moment* —eh/2mc-o whose energy in the magnetic field will be* 
ehH? /2mc-o-,. Thus the quantum Hamiltonian will be 


1 € 2 € 2 ec  ehH 
H= 5 { (pe 3atn) + (rot Shoe) tet C4 Se (88) 
There ought strictly to be other terms in this Hamiltonian giving the interaction 
of the magnetic moment of the electron with the electric field of the nucleus of 
the atom, but this effect is small, of the same order of magnitude as that of 
the relativity variation of the mass of the electron with its velocity, and will be 
neglected here. It will be taken into account in the relativity theory of the electron 
given in Chapter XIII. 
If the magnetic field is not too large, we can neglect terms involving #2, so that 
the Hamiltonian (33) reduces to 




















1 e-  e€ eh, 
H= Se 9 peslEne ag (me icere a ne — Yr i 
I ai GBe a. tite: Cone 1 Con, Ee. ) 
7 = shapes 34 
dy, (Pe + Py + Pz) — — + 5 (mz + hoz) (34) 


The extra terms due to the magnetic field are now* e#/2mc: (mz + hoz). 
But these extra terms commute with the total Hamiltonian and are thus constants 
of the motion. This makes the problem very easy. The stationary states of 
the system, i.e. the eigenstates of the Hamiltonian (34), will be those eigenstates of 
the Hamiltonian for no field that are simultaneously eigenstates of the observables 
mz, and a,, or at least of the one observable m, + ho,, and the energy-levels 
of the system will be those for the system with no field, given by (27) if one 
considers only closed states, increased by an eigenvalue of* e#/2mc- (mz + ho-). 
Thus any stationary state of the system with no field which is spacially quantized in 
the z-direction, #.e. for which m, has the numerical value m/,, an integral multiple of 
h, and for which also o, has the numerical value 0, = +1, will still be a stationary 
state when the field is applied. Its energy will be increased by an amount consisting 
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of the sum of two parts, a part* e#/2mc-m/’, arising from the orbital motion, 
which may be considered as due to an orbital magnetic moment —em’,/2mc, 
and a part* e#/2mc-ho!, arising from the spin. The ratio of the orbital magnetic 
moment to the orbital angular momentum m, is —e/2mc, which is half the ratio of 
the spin magnetic moment to the spin angular momentum. This fact is sometimes 
referred to as the magnetic anomaly of the spin. 

Since the energy-levels now involve m,, the selection rule for m, obtained 
in the preceding section becomes capable of direct comparison with experiment. 
According to this selection rule, m, can change by h, 0 or —A during 
an emission process. This means that the amount of energy emitted will differ 
by -eh#?/2mc, 0 or ehH?/2mc respectively from the amount emitted when 
there is no field, since o, will not change as it commutes with the electric 
displacement of the system. Thus the frequency of the emitted radiation will differ 
by -e#/4nmc, 0 or e#/4amc from that for no field, so that each spectral line 
for no field gets split up into three components. If one considers the radiation 
emitted in the z-direction, then from (30) the two outer components will be 
circularly polarized while the central undisplaced one will be of zero intensity. 
These results are in agreement with experiment and also with the classical theory 
of the Zeeman effect. The agreement with the classical theory ceases, however, 
when one takes into account relativity mechanics and the interaction of the spin 
with the electric field of the nucleus. 


49. Combination of Angular Momenta 


Suppose we have two particles moving in the central field of force, whose angular 
momenta are the vectors m and pw. The magnitudes of these vectors are 
the observables k and «, defined by (12) and 
K= (ue + wy + 2 + 4h) 

respectively. The total angular momentum will then be the vector M = m+ p, 
whose magnitude is K =(M24.M24 M24 42) 
Each of the observables k and & commutes with all the components of m, w and 
M. Thus k, « & K will commute with each other and can be given numerical 
values simultaneously. Our problem now is to determine the possible numerical 
values for K when k and & have given numerical values. 

The easiest way of solving this problem is to suppose k and & are equal to 
two given numbers, as we can do since they commute with all the observables 
mentioned in the problem, and then to use a matrix representation in which m, and 
jt, are diagonal. We can ignore all the observables describing the dynamical system 
that are not functions of the components of m and yz. Our matrix representation 
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will then have only a finite number of rows and columns, each labelled by a number 


m’, having one of the values k — 5h, k — 3h,..., —k+ 5h and a number yp, having 
one of the values k— $h, #—3h,..., —K+3h. The possible values of M! = m/,+ pu, 
will then be k+K—h, k+4—-—2h, k +4 —- 3h,..., —k —«% +h. The number of 


times each of them occurs is given by the following scheme (if one assumes for 
definiteness that k > k), 


k+k—-h, k+n—2h, k+K—-3h, ..., k—-k, k—-K—-h, ..., 
1 2 3 Seb 2K 2K athe 
fie “heh: tn. Rage: Ao 
2K 2k — 1 gist 1 


If we now make a canonical transformation to a representation in which K and 
M, are diagonal, the number of rows and columns of the matrices for which M, 
has a given value M! must remain unaltered. If kK’, K”,... are the possible values 
for K, there will be a set of rows and columns having the M,-values kK’ — 5h, 


K'— 3h,..., —K' + $h, together with a set having the M,-values K” — $h, 


K" — 3h,..., —K" + $h, &c. Comparing this distribution of M,-values with (35), 


we see that the possible values for K must be 
k+«—$h, k+«— 3h, k+«e— 3h,..., k—K+$h (36) 


This result is a quite general one applying to the combination of any two 
angular momenta, not necessarily the orbital angular momenta of two particles. 
For example, it could be applied to the orbital angular momentum and spin of 
an electron. In this case, since the spin angular momentum has the magnitude 
k = h, it shows that when the orbital angular momentum has the magnitude k, 
the combined angular momentum can have only one or other of the two magnitudes 
k+4h. 

We now have a general method for dealing with complicated atomic systems. 
For an isolated system the total angular momentum M is always a constant of 
the motion and its resultant K together with one of its components M, will be 
two commuting constants of the motion. We try to express M as the sum of two 
angular momenta m and ps whose magnitudes k and « are constants of the motion. 
If we can do this, then we try to express either of the parts, m say, itself as 
a sum of two angular momenta, m; and mg say, whose magnitudes k, and k2 are 
constants of the motion, and so on. We obtain in this way a series of constants of 
the motion M,, K, k, &, ky, ke,... which all commute with each other and may, 
if there are enough of them, be taken as defining a Heisenberg representation. 
The possible numerical values for the K, k, &,... specifying a row and column are 
restricted by the general rule (36). The energy will be some function of K, k, x, 
ky, ka,..., but independent of M,. In general one cannot secure that k, k, ki, ke 
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are exactly constants of the motion, but one may be able to choose them so that 
they are approximately so and then apply a perturbation method, as discussed in 
the next chapter. 

We shall now obtain the selection rule for the magnitude K of the total 
angular momentum M of a general atomic system. Let m be the orbital angular 
momentum of one of the electrons, whose co-ordinates are x, y, z, say, and let 
M-—m =u. It is not necessary for the present discussion that the magnitudes k 
and « of the two angular momenta m and yp into which M has been split up should 
be constants of the motion. We must obtain the condition that the (A’, A”) matrix 
element of x, y, or z shall not vanish. This is evidently the same as the condition 
that the (K’, kK”) matrix element of Aj, 2 or A3 shall not vanish, where Aj, Ao, 
and A3 are any three independent linear functions of x, y and z with numerical 
coefficients, or more generally with any coefficients that commute with K and are 
thus represented by matrices which are diagonal with respect to K. Let 

Ag = M,x#+ M,y + Mzz, 
An = MyM 4 =the, 
Ay = M,x — Mzz — thy, 
Az = Mzy — Myx — thz. 
We have M,Az + MyAy + M.dz = ) (Mz M,z — M,Mzy — ihM,2) 
LYzZ 
= 5 /(M,M, — M,M, — ihM,)z =0 (37) 
LYzZ 
from the general condition (11) for angular momentum. Thus A,, Ay, and A, are 
not linearly independent functions of x, y and z. Any two of them, however, 
together with Ao are three linearly independent functions of 7, y and z and may be 
taken as the above Aj, Az and A3, since the coefficients M,, M,, M, all commute 
with AK. Our problem thus reduces to finding the condition that the (K’, kK”) 
matrix elements of Ap, Az, A, and A, shall not vanish. The physical meanings of 
these X’s are that Ao is proportional to the component of the vector (x,y,z) in 
the direction of the vector M and Az, Ay, Az are proportional to the Cartesian 
components of the component of (x,y, z) perpendicular to M. 
From (4) together with the condition that x, y and z commute with pw we obtain 
MM t\ =i E) = 9; 
[M,,2] = [ms + 12,2] =y ao 
[M.,y] = -2, [Mo ge| =U: 
Hence [M., Ao] = [Mz, Mz|x + M,[M,, x] + [Mz, My]y + My|Mz, y] 
= M,z+ M,y — Mzy — M,x = 0. 
Thus Ap commutes with M,, and from symmetry it must commute also with M, 
and M,, so that it must commute with K. It follows that only the diagonal 
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elements (K’|\9|/’) of Ag can differ from zero, so the selection rule is that kK 
cannot change so far as this component of the electric displacement is concerned. 
With further applications of (38) we obtain 
M., Az| = [M., M,|z — M2[Mz, y] — ih[ Mz, & 
= —M,z+ M,2 — ihy = ry, 
M,, ry ai M,[M,, «| _ [M., M,|z =, in| M,, y 
= Moy = M2 ne = Ne: 
M.,, Vz = [M., M,|y + M,[M., y| — [M., M, v — M,|M., x] 
These relations between M, and Az, Ay, Az are of exactly the same form as 
the relations (4) & (5) between m, and x, y, z and also (37) is of the same 
form as (3). The observables \,, Ay, Az thus have the same properties relative 
to the angular momentum M that x, y, z have relative to m. The deduction 
in 847 of the selection rule for k when the electric displacement is proportional to 
(x,y, z) can therefore be taken over and applied to the selection rule for K when 
the electric displacement is proportional to (Az, Ay, Az). We find in this way that, 
so far as Az, Ay, Az are concerned, the selection rule for & is that it must change 
by +A. 
Collecting results, we have as the selection rule for K that it must change by 
0 or th. We have considered the electric displacment produced by only one of 
the electrons, but the same selection rule must hold for each of them and thus also 
for the total electric displacement. 











IX. PERTURBATION THEORY 


50. General Remarks 


IN the preceding two chapters exact treatments were given of some simple 
dynamical systems in the quantum theory. Most quantum problems, however, 
cannot be solved exactly with the present’ resources of mathematics, as they 
lead to equations whose solutions cannot be expressed in finite terms with 
the help of the ordinary functions of analysis. For such problems one must 
use a perturbation method. This consists in splitting up the Hamiltonian into 
two parts, one of which must be simple and the other small. The first part may then 
be considered as the Hamiltonian of a simplified or unperturbed system, which can 
be dealt with exactly, and the addition of the second will then require small 
corrections, of the nature of a perturbation, in the solution for the unperturbed 
system. If this second part contains a small numerical factor €, we can obtain 
the solution of our equations for the perturbed system in the form of a power series 
in €, which, provided it converges, will give the answer to our problem with any 
desired accuracy. Even when the series does not converge, the first approximation 
obtained by means of it is usually fairly accurate. 

There are two distinct methods in perturbation theory. In one of these 
the perturbation is considered as causing a modification of the states of 
the unperturbed system. In the other we do not consider any modification 
to be made in the states of the unperturbed system, but we suppose that 
the perturbed system, instead of remaining in one of these states, is continually 
changing from one to another, or making transitions, under the influence of 
the perturbation. Which method is to be used in any particular case depends 
on the nature of the problem to be solved. The first method is useful usually 
only when both the Hamiltonian for the undisturbed system and the perturbing 
energy (the correction in this Hamiltonian) do not involve the time explicitly, 
and is then applied to the stationary states. It can then be used for calculating 
things that do not refer to any definite time, such as the energy-levels of 





tThe early Twentieth Century 


153 


154 IX. PERTURBATION THEORY 


the stationary states of the perturbed system, or, in the case of collision problems, 
the probability of scattering through a given angle. The second method must, on 
the other hand, be used for solving all problems involving a consideration of time, 
such as those about the transient phenomena that occur when the perturbation 
is suddenly applied, or more generally problems in which the perturbation varies 
with the time in any way (i.e. in which the perturbing energy involves the time 
explicitly in an arbitrary way). Again, this second method must be used in 
collision problems, even though the perturbing energy does not here involve the 
time explicitly, if one wishes to calculate absorption and emission probabilities, 
since these probabilities, unlike a scattering probability, cannot be defined without 
reference to a state of affairs that varies with the time. 


51. The Change in the Energy-levels caused by 
a Perturbation 


The first of the above-mentioned methods will now be applied to the calculation 
of the changes in the energy-levels of a system caused by a perturbation. 
The perturbing energy, like the Hamiltonian for the unperturbed system, must now 
not involve the time explicitly. Our problem has a meaning, of course, 
only provided the energy-levels of the unperturbed system are discrete and 
the differences between them are large compared with the changes in them 
caused by the perturbation. This fact results in the treatment of perturbation 
problems by the first method having some different features according to whether 
the energy-levels of the unperturbed system are discrete or continuous. 
Let the Hamiltonian of the perturbed system be 


H=H)+V, (1) 


Hy being the Hamiltonian of the unperturbed system and V the small perturbing 
energy. By hypothesis each eigenvalue H’ of H lies very close to one and only 
one eigenvalue Hj of Ho. It is convenient to use the same number of primes 
to specify any eigenvalue of H and the eigenvalue of Hp to which it lies very close. 
Thus we shall have H” differing from Hf by a small quantity of order V and 
differing from Hj by a quantity that is not small unless Hj = Hj. We must now 
take care always to use different numbers of primes to specify eigenvalues of H 
and Hp) which we do not want to lie very close together. 
Let w(H’) be an eigen-w of H belonging to the eigenvalue H’, so that 


Ay)(H') = B"H’). (2) 


This means that w(H’) denotes a stationary state of the perturbed system of 
energy H’. Again, let Y(H{) be an eigen-w of Hj (at some particular time ¢) 
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belonging to the eigenvalue Hj, so that 
Aow)( Hy) = Ay v(Ao). (3) 


This ~(H7) will denote a non-stationary state of the perturbed system, and indeed 
a different non-stationary state for each different value of the above t, but for 
the unperturbed system it will denote a stationary state of energy H{. 

Now suppose that for the unperturbed system there is only one stationary 
state for each energy-level Hj, i.e. the unperturbed system is non-degenerate. 
This requires that Ho shall have only one independent eigen-y belonging to 
any eigenvalue Hj (which is a condition governing only the form of the 
observable Hp and independent of whether we are considering the perturbed 
or the unperturbed system). From our assumption that the changes 
in the energy-levels caused by the perturbation are small compared with 
the differences of the energy-levels of the unperturbed system, there must be only 
one independent eigen-w of H belonging to any eigenvalue H’, so that the perturbed 
system is also non-degenerate. The fact that the perturbing energy V is small, 
or that Ho (at time t) and A are two nearly equal observables, will require, not only 
that their eigenvalues are nearly equal, but also that corresponding eigen-w’s are 
nearly equal, apart from numerical factors. Thus we shall have 


Y(H') = eb(Hy) + v1, (4) 


where c is a number and y, is a small ~-symbol. We may assume w, to be 
orthogonal to w(H%), since if it were not so it could be expressed as the sum of 
two parts, one of which is orthogonal to ~(H}) while the other is a numerical 
multiple of (HG) which can be absorbed in the first term of the right-hand side 
of (4). We can now take c = 1, so that we have 


b(B") = o( Ho) + v1 (5) 


where 7, is small and orthogonal to (HH). 
From (1), (2) and (5) we now obtain 


{Ho + V}{v(Ho) + di} = AY (A’) = HH) — A'{b( Ap) + dif. 
With the help of (3), this gives 
Aoh( Ho) + Hot + Vib(ig) + Vy = Ho) + A. 
If we neglect the second-order term Vw, this reduces to 


{H’ — Ho}v(Ho) + {H" — Ho}v1 = Vib( Ho). (6) 
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If we now multiply this equation throughout by ¢(H}), the conjugate imaginary 
symbol to (4), on the left, the second term will contribute nothing, since 


(Ho) {H! — Ho}d1 = (Ao) {H' — Ho}yi = {A — Ao} o(Ho)di = 0, 
on account of ¢(H() and y being orthogonal. We shall thus be left with 
H! — Hi = 6(Hi)V(M), (7) 


assuming @(H() and w(Hé) to be normalized. 

This result gives us the first-order change in the energy-level of any state caused 
by the perturbation. It shows that the first-order change in the energy-level is equal 
to the average value of the perturbing energy for the unperturbed stationary state. 
When formulated in this way, this result in quantum perturbation theory is 
the same as in the classical theory and as in the old quantum mechanics of 
Niels Bohr’s theory. One can say alternatively that the first-order change in 
an energy-level is equal to the corresponding diagonal element of the matrix 
representing the perturbing energy in a representation in which the Hamiltonian 
for the unperturbed system is diagonal 7.e. in a Heisenberg representation for 
the unperturbed system. 

We must now consider the case when the unperturbed system is degenerate, 
so that there are several eigen-7’s of Ho belonging to the same eigenvalue H}. 
The perturbation may now, perhaps, be such that the perturbed system is 
non-degenerate, or that it is not so much degenerate as the unperturbed system. 
This means that each energy-level Hj of the unperturbed system gets split up 
by the perturbation into several energy-levels H’ all lying close to Hj* We shall 
now have that every eigen-w of H is approximately equal to an eigen-w of Ao, 
but the converse, that every eigen-~) of Ho is approximately equal to an eigen-w 
of H, will not be true, as may be seen from the following argument. If ~w, and wy, 
are two eigen-w’s of Ho belonging to the same eigenvalue and are approximately 
equal respectively to two eigen-w’s of H belonging to two different eigenvalues, 
then any linear combination of them, awWq + bw», will also be an eigen-w of Ho 
but will not be approximately equal to any eigen-w of H. The problem of finding 
which eigen-w’s of Hp are approximately equal to eigen-w’s of H is the analogue 
of the problem of secular perturbations in classical mechanics. 

Any eigen-) of Ho belonging to the eigenvalue Hj is expressible as a linear 
combination of a complete set of such eigen-’s. We shall choose such a set 





*To distinguish these energy-levels one from another we should require some more elaborate 
notation, since according to the present notation they must all be specified by the same number 
of primes, namely, by the number of primes specifying the energy-level of the unperturbed 
system from which they arise. For our present purposes, however, this more elaborate notation 
is not. required. 
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consisting of the simultaneous eigen-w’s, w(Hjé'), of Ho and a number of 
observables € that commute with Hp and with each other and that together with 
Ho form a complete commuting set of observables. Any eigen-~ (HG) is now 
expressible in the form 


d(H) = d— vE) (E10), 
Z 


where the coefficients (€’|) are numbers forming a representative of ~(H}). 
Any eigen-w w(H’) of H, belonging to some eigenvalue H’ that lies close to H(, 
is approximately equal to some ~(H(6) and is therefore of the form 


dH) = So bee’) (EN) + v1, (8) 
a 


where 7, is small. As in the non-degenerate case, we may assume that , is 
orthogonal to each w(H}é'), since if it is not it can be expressed as the sum of 
two parts, one of which is orthogonal to the ~(H}é’)’s while the other is a linear 
combination of them. We now obtain with the help of (1), (2) and (3) 


{Ho + V} {Sve (|) + ow = Wy(H") = HY) (F") 


EL 
eid {Sve (é/|) + ow 
é! 
or {H — Ho} oC AGE') (El) + {H' — Ho} = VD o( Hoe) (€'1), 
é é 
with neglect of the second-order term Vw 1. If we multiply this equation throughout 


by ¢(4$€”) on the left, we shall again have the term $(H(€”){ H'— Ho} vanishing 
and shall be left with 


{H! — Hp} (E"l) = S¢ o( Hoe” Vv Hoe (El), 
5 


provided the w(H(é’) are normalized. This result is the same as 


{H' — Ho} (€'l) = > (Hoe V 06") (€"1), (9) 
ry 


where (H(é'|V|Hjé") is an element of the matrix representing V in 
the (Ho, €)-representation. 

Equation (9) is of the form of the standard equation of the theory of 
eigenvalues. It shows that H’ — H@ is an eigenvalue of the matrix (H9é"|V|Hé"). 
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This matrix is a part of the representative of the perturbing energy V in 
a Heisenberg representation for the unperturbed system, namely the part 
consisting of those elements that refer to the same unperturbed energy level 
Hj for their row and column. Each change of the energy-level Hj caused by 
the perturbation is an eigenvalue of this matrix and further the eigenfunctions, 
namely the quantities (€’|), are just the coefficients required in (8) to give us those 
linear functions of the eigen-~’s of Hp belonging to the eigenvalue Hj that are 
approximately eigen-w’s of H and approximately represent stationary states of 
the perturbed system. We have thus obtained to the first order the energy-levels 
and stationary states of the perturbed system. It should be noticed that these 
first-order results are independent of the values of all those matrix elements of 
the perturbing energy which refer to two different energy-levels Hj and Hj of 
the unperturbed system. 

One can use this perturbation method for the calculation of the higher 
approximations if required. General recurrence formulas giving the n-th order 
corrections in terms of those of lower order have been obtained by Born, 
Heisenberg, and Jordan.* 


52. The Perturbation considered as causing 
Transitions 


We shall now consider the second of the two perturbation methods mentioned 
in §50. We suppose again that we have an unperturbed system governed by 
a Hamiltonian Ho which does not involve the time explicitly, and a perturbing 
energy V which can now be an arbitrary function of the time. The Hamiltonian 
for the perturbed system is again H = Hj)+V. For the present method it does not 
make any essential difference whether the energy-levels of the unperturbed system, 
i.e. the eigenvalues of Ho, form a discrete or continuous set. We shall, however, 
take the discrete case, for definiteness. 

We introduce an a-representation in which a complete set of commuting 
observables a are diagonal, each of which is the value at time t of some dynamical 
variable that is a constant of the motion for the unperturbed system. This means 
that Ho at time t commutes with each of the a’s and is thus represented by 
a diagonal matrix 

(a’| Ho|a’") = Hba'a'"- (10) 





“Born, M., Heisenberg, W., & Jordan, P. (1926). ,,Zur Quantenmechanik. II.“ Zeitschrift Ftir 
Physik, 35(8-9), 557-615. doi:10.1007/bf01379806 . 
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If the phases of the representation are such that the Schrddinger equation holds, 
we have, using stars to distinguish the representatives in this case, 


in (a|)* = (alo + Vial)" (ar) 


Ql! 


= Hi(o!l)* + S(e'|Vla")* (a!)" (11) 


For our present purpose, however, it is more convenient to choose these phases 
to be those of the Heisenberg representation for the undisturbed system, so that 
our representative (a’|) of a state is connected with the Schrédinger one (a’|)* by 
the relation 

(a'[)* = ee"), (12) 
which was obtained at the end of §38. The two representatives of an observable 
will be connected in the same way by 


(a’|E|a’”)* ae i(Hj—Hg I/F (ay Ela”). 


The representative (10) of Ho is, of course, the same in either case, since it 
is diagonal. 

Our new representative (a’|) does not satisfy the Schrédinger equation, 
of course, but satisfies instead the following equation, obtained by substituting (12) 


n (11),} 
ih —iHi/h- e Hot/h (a!) +e al] 


= Hie Hot/*(q!') ite YS (a'|Vla")*e 0/8 a"), 


Ql! 


which reduces to 


eee HG) uh a! |V|al")* (a! ny 
= ae Va") (a")). (13) 


all 


The Schrédinger representative (a’|V|a”’)* of the perturbing energy V does not 
depend on t, except in so far as V itself involves ¢ explicitly, while the representative 
(a’|V|a”) appearing in our equation (13) varies rapidly with t, according to 
the Heisenberg law e(#0—%)t/" when one neglects the explicit dependence of V 
on t. 
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Equation (13) is the fundamental equation of the present method in 
perturbation theory. It is an exact equation, no use having yet been made of 
the fact that the perturbation is small. It shows how the representative of a state 
of a perturbed system varies with the time when the representation is chosen so 
that the whole of this variation is caused by the perturbation, and thus expresses 
most clearly the way in which the perturbation may be considered as causing 
a continual change in the state of the system. At any instant the probability of 
the a’s having specified values a’ is 


P’ = |(o')/’ (14) 


provided P’ is normalized. 

We shall now obtain an approximate solution to equation (13) for a given 
initial value of the representative (a‘|) of the state. Since V is small, the rate of 
change of (a’|) is small and (a’|) remains approximately equal to its initial value, 
at any rate for times that do not differ too much from the initial time. We can thus 
obtain a first approximation by substituting for (a”|) in the right-hand side of (13) 
its initial value and then performing a simple integration. We may then obtain 
a second approximation by substituting the first approximation in the right-hand 
side of (13), and so on indefinitely. 

Let the initial value of (a’|), i.e. the value at time t = 0, be ao(a’), or ag say, 
for brevity. We shall then have in the first approximation for the value of (a’|) at 
an arbitrary time T,* 


(ae =a — fh > | (a'[V Jaa at 
all 0 


=a) +a}, 
say, a’, being the first-order correction, whose value at time T ist 
a, = —i/h- Soal ‘| (a!|Vlo"), dt. (15) 
all 0 
The second approximation at an arbitrary time T’ will now be* 
T 
Ce eS ae (a'|V|o")r[a" +a") dr 
0 


all 


pane Aa ! ! 
=) + a7 + Agr, 





OD 
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where a}, the second-order correction, has the value* at time T 


—i/h- me (a'|Vla”), ai, d 


aie: pa, yf (a’|V ja"), satanic (16) 


The probability (14) of i a’s having the values a’ at any time is now, 
to the second order of accuracy, 
P! = (ag + a} + a5) (@% + @, + G) 
= Apdy + (aM + apG,) + (aap + aa, + apa) +--- (Lz) 
=P\+Pi+Po+-:, 
P) being the initial value of this probability and P; and Pj being the first and 
second order corrections. 

Suppose now that we are given, not the initial value aj of (a’|), but only 
the initial probability Pj of the a’s having any specified values a‘, and want 
to calculate the probability at any subsequent time of the a’s having 
specified values. We now know only the modulus of (a’|) and not its phase, 
so that we must average over all phases. This averaging results in a considerable 
simplification in the expression (17) for P’, since this expression is bilinear in ao 
and Gp |both a; and az being linear functions of ap according to (15) and (16)], 


and thus consists of a sum of terms of the form ajaj’. The average of aja’ or 


ao(a”)@o(a’”) will vanish except when a/” = a”, so that the only surviving terms 


will be those of the form ajaj. In this way P{ reduces to* 


/ 
air 





a a) 
Pi, = Gy, Gq + AM, 


[ijn | (a!|Vlo") at] a +a fina f (a!|V lo") it| 
0 0 


= 0. 


Similarly P} reduces to* 


/ ——, J / os /—/ 
oP = AgpPAg + Ap Ayp + ApAgp 


=-1/f’- wits (a’|V a”), )nar f (a Va’), dt 
2 
Sa ae S- ay Nett ft (a’|Vla"’), dt 


al’ 


=). aa, (a”"|Vla'), dr (a!W la") at 
0% 


all 








162 IX. PERTURBATION THEORY 


use being made, in dealing with the third term, of the fact that the matrix (a’|V |a”) 
is Hermitian. If we interchange ¢ and 7 in this third term, we can combine it with 
the first term to gives 


T T T t 
lanl (aes S- | dr pu + | dt [eo (a’|Via"),(a”"|Va’)s 
Q!! 0 0 0 0 
T T 
== [0g (mS far fa (a'|Vla”),(0"|V |av’)s 
qt 9 0 


= — ai? / 0 


aq!” 


2 





| (a’|V a”), dt 
0 





Thus our expression for P; becomes’ 
2 


Pop = 1/R? - Y{lagl” — lal} 


all 


T 
= 1/n?- SP — Py | [ (livia), at 
Q!! 0 


T 
| (a!|Vla"”), dt 
0 


2 
’ 











and the probability P’ of the a’s having the values a’ is, to the second order 
of accuracy,$ 








T 2 
P= Ph+1/W-S {PY — Poy | (a’|Via"), dt} . (18) 
QQ! 0 
This result is capable of a simple interpretation. If we suppose that initially 
the a’s certainly have the values a”, so that Pi = 1, Pi = 0 for a’ F a", 


(in which special case the averaging over the phases of the ao’s is not necessary), 
then the right-hand side of (18) reduces to the single term’ 


2 


1/h? - [ff ceivie’ dt| = P(a", a’) (19) 





say. This may be interpreted as the probability of the system making a transition 
from the state a” to the state a’ under the influence of the perturbation V during 
the interval of time 0 to T. It is symmetrical between a’ and a” Returning 
now to the general case, we see that (18) may be regarded as expressing that 
the change in the probability of the a’s having the values a’ during the time interval 
0 to T, namely P;, — Pj, is made up of the total probability )°,,, Pi P(a”, a’) of 
the system jumping into the state a’ from some other state a”, minus the total 
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probability Pj >>, Pla’, a) of its jumping out of the state a’, during this time 
interval. Thus the ordinary laws of probability apply, showing that there is no 
interference between the different transition processes. If we had not averaged 
over the initial phases, then there would have been such interference. 

The integrand in (19) is the representative in a certain representation of 
the perturbing energy at time ¢t. This representation is one that does not 
depend very much on f¢, since if we put V = 0 it would become the Heisenberg 
representation and would not depend on ¢ at all. Hence we can, without spoiling 


te V, dt | a") 





the order of accuracy of our result, replace the integral in (19) by (a' 


and obtain an alternative expression for the transition probability$ 


T 
(0'|/ V, dt a") 
0 


This provides a simple physical meaning for the non-diagonal elements of 
the matrix representing an observable when this observable can be regarded as 
the time integral of a perturbing energy. 


2 


P(a", a’) = 1/h? - (20) 














53. Application to Radiation 


In the preceding section a general theory of the perturbation of an atomic system 
was developed, in which the perturbing energy could vary with the time in 
an arbitrary way. A perturbation of this kind can be realized in practice by 
allowing incident electromagnetic radiation to fall on the system. Let us see what 
our result (19) or (20) reduces to in this case. 

If we neglect the effects of the magnetic field of the incident radiation, 
and if we further assume that the wave-lengths of the harmonic components of 
this radiation are all large compared with the dimensions of the atomic system, 
then the perturbing energy is simply the scalar product 


V =(D, @), (21) 


where D is the total electric displacement of the system and @ is the electric 
force of the incident radiation. We suppose @ to be a given function of the time. 
If we take for simplicity the case when the incident radiation is plane polarized with 
its electric vector in a certain direction and let D denote the Cartesian component 
of D in this direction, the equation (21) for V reduces to the ordinary product 


V = Dé, 
where & is the magnitude of the vector €& The matrix elements of V are 


(a’|V a") = (a'|Dla")é, 
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since & is a number. Now (a‘|D\a”) varies with the time t according to 
the Heisenberg law 


(a’|D\a") = (a’|D] al") e% Ho Ho t/h 


(a’|D\a”), being constant, and hence our expression (19) for the transition 
probability becomes* 


2 


T 
P(al, a") = 1/F? - |(a’|Dla")|? | Geo eM gel. (22) 
0 








If the incident radiation during the time interval 0 to T’ is resolved into its 
Fourier components, the energy crossing unit area per unit frequency range about 
the frequency v will be, according to classical electrodynamics,* 


T . 
| Cen dt 
0 


Comparing this with (22), we see that the transition probability between two 
states a’ and a” with energies Hj and Hj depends on that Fourier component 
of the incident radiation whose frequency is v = |Hj — H@| /h, in agreement with 
Bohr’s theory. The magnitude of this transition probability is connected with 
the intensity of the Fourier component through the relation* 


2 


E, = c/2n - (23) 








P(a’, a") = 2n/ch? - |(a’|D|a”)|? Ey. (24) 


This relation gives the probability of the system, if initially in the state of lower 
energy, of absorbing radiation and being carried to the upper state, and if initially 
in the upper state, of being stimulated by the incident radiation to emit and fall to 
the lower state. The present theory does not account for the fact that the system, 
if in the upper state with no incident radiation, can emit spontancously and fall 
to the lower state. 

The existence of the phenomenon of stimulated emission was inferred 
by Albert Einstein} long before the discovery of quantum mechanics, from 
a consideration of thermodynamic equilibrium between atoms and a field of 
black-body radiation satisfying Planck’s law. Einstein showed that the transition 
probability for stimulated emission must equal that for absorption between 
the same pair of states and deduced a relation connecting this transition 
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tRinstein, Albert "Zur Quantentheorie der Strahlung", Physikalische Zeitschrift 18, 
pp 121-128, https: //ui-adsabs.harvard.edu/abs/1917PhyZ...18..121E English translation: 
https://s3.cern.ch/inspire-prod-files-9 /9e9ac9d1e25878322fe8876fdc8aa08d 
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probability with that for spontaneous emission. Werner Heisenberg’s assumption 
for the spontaneous emission probability, given in §38, together with Ejinstein’s 
theory, will therefore provide us with values for the transition probabilities for 
absorption and stimulated emission. These values are in agreement with (24). 
Thus the theory of the present section gives a partial justification for Heisenberg’s 
assumption. The complete justification will be provided by the general theory of 
Chapter XII, in which the electromagnetic field will be treated as a dynamical 
system interacting with the atom according to the laws of quantum mechanics. 
This general theory will not only confirm the result (24) for absorption and 
stimulated emission, but will also give the required value for the spontaneous 
emission probability. 


54. Transitions caused by a Perturbation 
Independent of the Time 


The perturbation method of §52 is still valid when the perturbing energy V does 
not involve the time ¢ explicitly. Since the total Hamiltonian H in this case 
does not involve t explicitly, we could now, if desired, deal with the system by 
the perturbation method of 851 and find its stationary states. Whether this method 
would be convenient or not would depend on what we want to find out about 
the system. If what we have to calculate makes an explicit reference to the time, 
e.g. if we have to calculate the wave function at one time when we are given 
its value at another time, the method of 852 would be the more convenient one. 

Let us see what the result (19) for the transition probability becomes when 
V does not involve t explicitly. The matrix element (a’|V|a”) now varies with t 
according to the Heisenberg law and thus its time integral is 


T T 
i; (a’|Vla"); dt = (@'\Via"), | el Ho-Hoyt/h ge 
0 0 
(HR HE)T/R 4 


(Hy — Hi) /R 


e 





= (2'|V la"), 


provided Hj # Hj. Thus the transition probability (19) becomes 
P(al, a”) = (a!|V ja")? [etHo— Ho F/R = 1] je ae eo: 1]/(Hi = Hi? 
= 2|(a'|Vla")|* [1 — cos{(Ho — Ho)T/h}]/(Ho — Ho)” (25) 


If Hj differs appreciably from H’ this transition probability is small and 
remains so for all values of 7. This result is required by the law of 
the conservation of energy. The total energy H is constant and hence 
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the proper-energy Ho (i.e. the energy with neglect of the part V due to 
the perturbation), being approximately equal to H, must be approximately 
constant. This means that if Ho initially has the numerical value Hj, at any 
later time there must be only a small probability of its having a numerical value 
differing considerably from H 6. 

On the other hand, when the initial state a’ is such that there exists another 
state a” having the same or very nearly the same proper-energy Ho, the probability 
of a transition to the final state a” may be quite large. The case of physical 
interest now is that in which there is a continuous range of final states a” having 
a continuous range of proper-energy levels Hf passing through the value H{ 
of the proper-energy of the initial state. The initial state must not be one of 
the continuous range of final states, but may be either a separate discrete state or 
one of another continuous range of states. We shall now have, remembering the 
rules of §28 for the interpretation of probability amplitudes with continuous ranges 
of states, that, with P(a’, a”) having the value (25), the probability of a transition 
to a final state within the small range a” to a” + da” will be P(a’, a”")da” when 
the initial state a’ is discrete and will be proportional to this quantity when a’ is 
one of a continuous range. 

We may suppose that the a’s describing the final state, which are any complete 
set of commuting dynamical variables that all commute with Ho, consist of Ho 
itself together with a number of other dynamical variables 3. (The 6’s need have 
no meaning for the initial state a’) We shall suppose for definiteness that the 
2's have only discrete eigenvalues. The total probability of a transition to a final 
state a” for which the (’s have the values 8” and Ho has any value, (there will be 
a strong probability of its having a value near the initial value H{,) will now be 
(or be proportional to)* 


[Pe a”) dHg 


ay * (al [VHB [1 — cos{(H, — HEYT/R}]/(H, — HY? dH! — (26) 


(oe) 


= arjn- f \(a’|V|Hi, + Ra /T, B")|? [1 — cos a]/2?- dx 


(oe) 


if one makes the substitution (Hj — H()T/h. For large values of T this reduces to* 


2T/h- Ko'ivigey? [l—cosa]/z?- dx 


= 2nT/h- |(ol|VE, BVI. (27) 
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Thus the total probability up to time 7 of a transition to a final state for which 
the @’s have the values 6” is proportional to T. There is therefore a definite 
probability coefficient, or probability per unit time, for the transition process under 
consideration, having the value* 


2n/h- \(a’|V| Hy B’))’. (28) 


It is proportional to the square of the modulus of the matrix element, associated 
with this transition, of the perturbing energy. 

In order that the approximations used in deriving (27) may be valid, the time T 
must be not too small and not too large. It must be large compared with the 
periods of the atomic system in order that the evaluation of the integral (26) 
leading to the result (27) may be correct, while it must not be excessively 
large or else the general formula (19) will break down. In fact one could make 
the probability (27) greater than unity by taking T large enough. The upper limit 
to T is fixed by the condition that the probability (19) or (27) must be small 
compared with unity. There is no difficulty in T’ satisfying both these conditions 
simultaneously provided the perturbing energy V is sufficiently small. 


55. The Anomalous Zeeman Effect 


One of the simplest examples of the perturbation method of 851 is the calculation 
of the change in the energy-levels of a general atom caused by a uniform 
magnetic field. The problem of a hydrogen atom in a uniform magnetic field 
has already been dealt with in §48 and was so simple that perturbation theory 
was unnecessary. The case of a general atom is not much more complicated when 
we make a few approximations such that we can set up asimple model for the atom. 

We first of all consider the atom in the absence of the magnetic field along 
the lines indicated in 849 and look for angular momenta that are constants of 
the motion. The total angular momentum of the atom, the vector j say, is certainly 
a constant of the motion. This angular momentum may be regarded as the sum 
of two parts, the total orbital angular momentum of all the electrons, 1 say, 
and the total spin angular momentum, s say. Thus we have j = 1+s. Now the effect 
of the spin magnetic moments on the motion of the electrons is small compared 
with the effect of the Coulomb forces and may be neglected as a first approximation. 
With this approximation the spin angular momentum of each electron is a constant 
of the motion, there being no forces tending to change its orientation. Thus s, 
and hence also 1, will be constants of the motion. We now have the three constant 
angular momenta 1, s and j, related in the same way as the m, yw and M of §49. 
The magnitudes, /, s and 7 say, of these angular momenta will be given by 
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1+4h=(2+0 +2 +4n)5 


1 
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s+4h = (s2 +57 + 52 + 1A7)2 
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j+dh= (+ +92 +4r), 
corresponding to equation (12) of Chapter VIII, and from (36) of that chapter 
we see that with given numerical values for 7 and s the possible numerical values 
for 7 are 

















l+s—3h, l+s-— 3hA,... 2 — s| + $h. 


Let us consider a stationary state for which /, s and 7 have definite numerical 
values in agreement with the above scheme. The energy of this state will depend 
on /, but one might think that with neglect of the spin magnetic moments it would 
be independent of s, and also of the direction of the vector s relative to 1, and thus 
of j. It will be found in Chapter XI, however, that the energy depends very 
much on the magnitude s of the vector s, although independent of its direction 
when one neglects the spin magnetic moments, on account of certain phenomena 
arising from the fact that the electrons are indistinguishable one from another. 
There are thus different energy-levels of the system for each different value of 1 
and s. This means that / and s are functions of the energy, according to the general 
definition of a function given in §15, since the / and s of a stationary state are 
fixed when the energy of that state is fixed. 

We can now take into account the effect of the spin magnetic moments, 
treating it as a small perturbation according to the method of 851. The energy 
of the unperturbed system will still be approximately a constant of the motion 
and hence / and s, being functions of this energy, will still be approximately 
constants of the motion. The directions of the vectors | and s, however, not being 
functions of the unperturbed energy, need not now be approximately constants of 
the motion and may undergo large secular variations. Since the vector j is constant, 
the only possible variation of 1 and s is a precession about the vector j. We thus 
have an approximate model of the atom consisting of the two vectors 1 and s of 
constant lengths precessing about their sum j, which is a fixed vector. The energy 
is determined mainly by the magnitudes of 1 and s and depends only slightly on 
their relative directions, specified by 7. Thus states with the same / and s and 
different j will have only slightly different energy-levels, forming what is called 
a multiplet term. 

Let us now suppose our atom to be subjected to a uniform magnetic field of 
magnitude # in the direction of the z-axis. The extra energy due to this magnetic 
field will consist of a term* 


eH /2mc-(m, + ho.), (29) 
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like the last term in equation (34) of Chapter VIII, contributed by each electron, 
and will thus be altogether* 


eH /2mc ‘So(m. + hoz) = ef//2mc: (1, + 28,) = eH /2mc- (jf, +z). (30) 


This is our perturbing energy V. We shall now use the method of 851 to determine 
the changes in the energy-levels caused by this V. The method will be legitimate 
only provided the field is so weak that V is small compared with the energy 
differences within a multiplet. 

Our unperturbed system is degenerate, on account of the direction of 
the vector j being undetermined. We must therefore take, from the representative 
of V in a Heisenberg representation for the unperturbed system, those matrix 
elements that refer to one particular energy-level for their row and column, 
and obtain the eigenvalues of the matrix thus formed. We can do this best by first 
splitting up V into two parts, one of which is a constant of the unperturbed motion, 
so that its representative contains only matrix elements referring to the same 
unperturbed energy-level for their row and column, while the representative of 
the other contains only matrix elements referring to two different unperturbed 
energy-levels for their row and column, so that this second part does not affect 
the first-order perturbation. The term involving j, in (30) is a constant of 
the unperturbed motion and thus belongs entirely to the first part. For the term 
involving s, we have 





s.(j2 + je + j2) = crn + SyJy 4 345) (Saja a 1282) 34 aE (Sey — Ju8y)Jy 





or Je : 1 ; : 
°: = Bap alah’) —(U—ah') + (8° ah + Saal nde Yada: (31) 


Vy = TeSx os Sx Je 7 Lees oly = bySe rey bx 
The first term in this expression for s, is a constant of the unperturbed motion and 
thus belongs entirely to the first part, while the second term, as we shall now see, 
belongs entirely to the second part. 
Corresponding to (32) we can introduce 


Vi Sy loa 
It can now easily be verified that 
Ja Va + Jy Vy + J2V2 = 9 


and that ; ; 
es Ya] = Vy ke al = ~Vx5 es “| = 0. 


These relations are of the same form as the relations (3), (4) and (5) of 
Chapter VIII, so that our yz, y, & yz are connected with the angular momentum j 


170 IX. PERTURBATION THEORY 


in the same way in which the z, y & z of Chapter VIII were connected with 
the angular momentum m. We can thus take over the analysis of 847, in which 
the condition was obtained for the non-vanishing of a matrix element of x, y 
and z in a representation in which k is diagonal. We find in this way that 
the only non-vanishing matrix elements of y,, yy, and yz in a representation in 
which j is diagonal are those referring to transitions in which 7 changes by +A. 
The coefficients of 7, and yy in the second term on the right-hand side of (31) 
commute with j, so that the representative of the whole of this term will contain 
only matrix elements referring to transitions in which 7 changes by +h, and thus 
referring to two different energy-levels of the unperturbed system. 

Hence the perturbing energy V becomes, when we neglect that part of it whose 
representative consists of matrix elements referring to two different unperturbed 
energy-levels, 








33 
2mc 9? — Sh? (38) 


eH, { | eae ee 
The eigenvalues of this give the first-order changes in the energy-levels. We can 
make the representative of this expression diagonal by choosing our representation 
such that j, is diagonal, i.e. by taking the fundamental states to be spacially 
quantized in the z-direction. The expression (33) then gives us directly 
the first-order changes in the energy-levels caused by the magnetic field. 
This expression is known as Landé’s formula. 

The result (33) holds only provided the perturbing energy V is small compared 
with the energy differences within a multiplet. For larger values of V a more 
complicated theory is required. For very strong fields, however, for which V is 
large compared with the energy differences within a multiplet, the theory is again 
very simple. We may now neglect altogether the energy of the spin magnetic 
moments for the atom with no external field, so that for our unperturbed system 
the vectors 1 and s themselves are constants of the motion, and not merely their 
magnitudes / and s. Our perturbing energy V, which is still* e#/2mc: (j. + 8), 
is now a constant of the motion for the unperturbed system, so that its eigenvalues 
give directly the changes in the energy-levels. These eigenvalues are integral or 
half-odd integral multiples of e#’h/2mc according to whether the number of 
electrons is even or odd. 
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56. General Remarks 


IN this chapter we shall investigate problems connected with a particle which, 
coming from infinity, encounters or ‘collides with’ some atomic system and, after 
being scattered through a certain angle, goes off to infinity again. The atomic 
system which does the scattering we shall call, for brevity, the scatterer. We thus 
have a dynamical system composed of an incident particle and a scatterer 
interacting with each other, which we must deal with according to the laws of 
quantum mechanics, and for which we must, in particular, calculate the probability 
of scattering through any given angle. This problem was first solved by Max Born 
by a method substantially? equivalent to that of the next section. We must take 
into account the possibility that the scatterer, considered as a system by itself, 
may have a number of different stationary states and that if it is initially in one 
of these states when the particle arrives from infinity, it may be left in a different 
one when the particle goes off to infinity again. The colliding particle may thus 
induce transitions in the scatterer. 

The Hamiltonian for the whole system of scatterer plus particle will not 
involve the time explicitly, so that this whole system will have stationary states 
represented by periodic solutions of Schrédinger’s wave equation. The meaning 
of these stationary states requires a little care to be properly understood. It is 
evident that for any state of the system the particle will spend nearly all its time 
at infinity, so that the time average of the probability of the particle being in 
any finite volume will be zero. Now for a stationary state the probability of 
the particle being in a given finite volume, like any other result of observation, 
must be independent of the time, and hence this probability will equal its time 
average, which we have seen is zero. We shall thus be interested only in the relative 
probabilities of the particle being in different finite volumes, their absolute values 
being all zero. Mathematically we have that if the ~ denoting a stationary 
state is normalized correctly for physical interpretation, i.e such that @W = 1, 
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and if we let Q denote that observable, which is a certain function of the position 
of the particle (at a given time), that is equal to unity if the particle is in a given 
finite volume and zero otherwise, then ¢Qw = 0, meaning that the average value 
of Q, i.e. the probability of the particle being in the given volume, is zero. 
It would therefore be more convenient for us to denote the stationary state by 
a w normalized to infinity, i.e. for which dw — 00,8 the infinity being such as 
to make @Qw finite. This finite ¢Qw would then give the relative probability of 
the particle being in the given volume. 

In picturing a state of asystem denoted by aw which is not normalized correctly 
for physical interpretation, but for which dw =n say, it may be convenient 
to suppose that we have n similar systems all occupying the same space but 
with no interaction between them, so that each one follows out its own motion 
independently of the others. We can then interpret daw, where a is any 
observable, directly as the total a for all the n systems. In applying these ideas 
to the above-mentioned ~ normalized to infinity, denoting a stationary state of 
the system of scatterer plus colliding particle, we should picture an infinite number 
of such systems with the scatterers all located at the same point and the particles 
distributed continuously throughout space. The number of particles in a given 
finite volume would be pictured as ¢QwW, Q being the observable defined above, 
which has the value unity when the particle is in the given volume and zero 
otherwise. If the ~ is represented by a Schrédinger wave function involving 
the Cartesian co-ordinates of the particle, then the square of the modulus of 
the wave funetion could be interpreted directly as the density of particles in 
the picture. One must remember, however, that each of these particles has its own 
individual scatterer. Different particles may belong to scatterers in different states. 
There will thus be one particle density for each state of the scatterer, namely, 
the density of those particles belonging to scatterers in that state. This is 
taken account of by the wave function involving variables deseribing the state 
of the scatterer in addition to those describing the position of the particle. 

For determining scattering coefficients we have to investigate stationary states 
of the whole system of scatterer plus particle. For instance, if we want to determine 
the probability of scattering in various directions when the scatterer is initially in 
a given stationary state and the incident particle has initially a given velocity 
in a given direction, we must investigate that stationary state of the whole 
system whose picture, according to the above method, contains at great distances 
from the point of location of the scatterers only particles moving with the given 
initial velocity and direction and belonging each to a scatterer in the given 
initial stationary state, together with particles moving outward from the point 
of location of the scatterers and belonging possibly to scatterers in various 
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stationary states. This picture corresponds closely to the actual state of affairs 
in an experimental determination of scattering coefficients, with the difference 
that the picture really describes only one actual system of scatterer plus particle. 
The distribution of outward moving particles at infinity in the picture gives 
us immediately all the information about scattering coefficients that could be 
obtained by experiment. For practical calculations about the stationary state 
described by this picture one may use the perturbation method of §51, taking as 
unperturbed system, for example, that for which there is no interaction between 
the scatterer and particle. 

In dealing with collision problems, a further possibility to be taken into 
consideration is that the scatterer may perhaps be capable of absorbing and 
re-emitting the particle. This possibility arises when there exists one or 
more states of absorption of the whole system, a state of absorption being 
an approximately stationary state which, at a certain time, is closed in the sense 
of §45 (i.e. the probability of the particle being at a greater distance than r from 
the scatterer tends to zero as r — oo). Since a state of absorption is only 
approximately stationary, its property of being closed will be only a transient 
one and after a sufficient lapse of time there will be a finite probability of 
the particle being on its way to infinity. Physically this means there is a finite 
probability of spontaneous emission of the particle. The fact that we had to use 
the word ‘approximately’ in stating the conditions required for the phenomena 
of emission and absorption to be able to occur shows that these conditions are 
not expressible in exact mathematical language. One can give a meaning to these 
phenomena only when one is using a perturbation method. They occur when 
the unperturbed system (of scatterer plus particle) has stationary states that 
are closed. The perturbation now spoils the stationary property of these states 
and gives rise to spontaneous emission and its converse absorption. 

For calculating absorption and emission probabilities it is necessary to deal 
with non-stationary states of the system, in contradistinction to the case for 
scattering coefficients, so that the perturbation method of 852 must be used. 
Thus for calculating an emission coefficient we must consider the non-stationary 
states of absorption described above. Again, since an absorption is always followed 
by a re-emission, it cannot be distinguished from a scattering in any experiment 
involving a steady state of affairs, corresponding to a stationary state of the system. 
The distinction can be made only by reference to a non-steady state of affairs, 
e.g. by use of a stream of incident particles that has a sharp beginning, so that 
the scattered particles will appear immediately after the incident particles meet 
the scatterers, while those that have been absorbed and re-emitted will begin 
to appear only some time later. This stream of particles would then be the picture 
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of a certain non-stationary w, normalized to infinity, which could be used for 
obtaining the absorption coefficient. 


57. The Scattering Coefficient 


We shall now consider the calculation of scattering coefficients, taking first the 
case when there is no absorption and emission, which means that our unperturbed 
system has no closed stationary states. We may conveniently take this unperturbed 
system to be that for which there is no interaction between the scatterer and 
particle. Its Hamiltonian will thus be of the form 


where H, is that for the scatterer alone and W that for the particle alone, namely* 
W = 1/2m- (p, + py + P2)- (2) 


The perturbing energy V, assumed small, will now be a function of the Cartesian 
co-ordinates of the particle x, y, z and also, perhaps, of its momenta pz, py, Pz, 
together with dynamical variables describing the scatterer. 

Since we are now interested only in stationary states of the whole system, 
we can use the perturbation method of 851. Our unperturbed system now 
necessarily has a continuous range of energy-levels, since it contains a free 
particle, and this gives rise to certain modifications in the perturbation method. 
The question of the change in the energy-levels caused by the perturbation, which 
was the main question of 851, no longer has a meaning, and the convention in 851 
of using the same number of primes to denote nearly equal eigenvalues of Ho 
and H now becomes redundant’. Again the problem of secular perturbations 
cannot now arise, since if the unperturbed system is degenerate the perturbed one, 
which must also have a continuous range of energy-levels, will also be degenerate 
to exactly the same extent. Any eigen-~ of the unperturbed Hamiltonian Ho, 
belonging to the eigenvalue Hj say, will be approximately equal to some eigen-~ 
of H, and indeed to each of an infinity of eigen-w’s of H belonging to a small 
range of eigenvalues H’ approximately equal to Hj. (The meaning of two 
w-symbols being approximately equal cannot be accurately defined in the case 
of continuous eigenvalues without a more rigorous theory than that aimed at in 
the present work. It should be noticed, though, that this meaning is such that 
two eigen-w’s of an observable belonging to two nearly equal eigenvalues may be 
approximately equal, in spite of the fact that they are orthogonal.) 
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We again express the stationary state v(H’) of the perturbed system as 
the sum of an eigen-~ (Hj) of the unperturbed Hamiltonian and a small 
correction w,. We can no longer, however, take ~, to be orthogonal to w(H1), 
as in equation (5) of 851. The reason for this is that when we introduce our 
w, as in equation (4) of §51 and then express this w as the sum of two parts, 
one a numerical multiple of ¢)(H}), and the other orthogonal to ~(H9), these parts 
may both be large, in the case of continuous eigenvalues H(, in spite of their sum 
being small. For example, these parts could be respectively of the form w(H() and 
—w(Hj + 6H6). Thus we cannot have our 7, both small and orthogonal to 7(H6) 
and we prefer to have it small. To make up for this lack of simplicity in 7, we can 
now take Hj exactly equal to H’ Let us call this number Hj or H‘, equal to 
the energy of the stationary state we are seeking, EF. We now have the equation 


(E — Ho)b(H") = Vp(f") (3) 
which gives (E — Ao) = Vip(F’) 
or (E — H.-W) = Vb(A5) (4) 


from (1), with neglect of the second-order term V7. We shall use this equation (4) 
for determining the stationary states of the perturbed system to the first. order. 

Let a@ denote a complete set of commuting variables describing the scatterer, 
which are constants of the motion when the scatterer is alone, and may thus be 
used for labelling the stationary states of the scatterer. This requires that H, shall 
commute with the a’s and be a function of them. We can now take a representation 
of the whole system in which the a’s and x, y, z, the co-ordinates of the particle, 
are diagonal. This will make H, diagonal. Let ¢(H)) be represented by (xa|0) and 
W, by (xal1), the single variable x being written in the wave function to denote 
x,y and z. In the same way the single differential dx will be written to denote 
the product dxdydz. Equation (4), written in terms of representatives, becomes, 
with the help of (2),* 


{E — H,(a’) + f?/2m-V?}(xa'|1) = > [oeaiviera’) da" (x"a"|0) (5) 


al 


Suppose that the incident particle has the momentum p® and that the 


initial stationary state of the scatterer is a® The stationary state w(H{) of 


our unperturbed system is now the one for which p = p® and a = a®, and hence 
its representative is of the form 


(xa|0) = daqoeP?*)/* (6) 


This makes equation (5) reduce to* 


{E — H,(a’) + h?/2m-V7}(aa'|1) = [ioa'lvia%a® diz0 cil. x)/h 
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or {k? + V?}(xa'|1) = F, (7) 
where* k*? = 2m/h? - {FE — H,(a’)} (8) 
and" F =2m/h? - [Geeivie%® dx® eP°)/h (9) 


a definite function of x, y, z and a’. We must also have 
E =H) = H,(a°) + p™/2m. (10) 


Our problem now is to obtain a solution (xa‘|1) of (7) which, for values of x, y & 
z denoting points far from the scatterer, represents only outward moving particles. 
The square of its modulus, |(xa‘|1)|?, will then give the density of scattered 
particles belonging to scatterers in the state a’ when the density of the incident 
particles is |(xa|0)|’, which is unity. If we transform to polar co-ordinates r, 6, ¢, 


equation (7) becomes 
o> 20 1 O 0 1 or 
24 i N)=F (11 
C Or? ror r2sind 00 = ay r2 sin? 6 0¢? \ intge!|1) (11) 


Now F' must tend to zero as r— oo, on account of the physical fact that 
the interaction energy between the scatterer and particle must tend to zero as 
the distance between them tends to infinity. If we neglect F in (11) altogether, 
an approximate solution for large r is* 


(r8da’|1) = u(8, d,a")/r- e™, (12) 





where wu is an arbitrary function of 6, ¢ and a’, since this expression substituted 
in the left-hand side of (11) gives a result of order r~*. When we do not neglect F, 
the solution of (11) will still be of the form (12) for large r, provided F' tends to 
zero sufficiently rapidly as r > oo, but the function u will now be definite and 
determined by the solution for smaller values of r. 

For values a’ of the a’s such that k?, defined by (8), is positive, the k in (12) 
must be chosen to be the positive square root of k?, in order that (12) may represent 
only outward moving particles, 7.e. particles for which the radial component. of 
momentum p,, represented by —ihO/Or, has a positive value. We now have 
that the density of scattered particles belonging to scatterers in state a’, equal 
to the square of the modulus of (12), falls off with increasing r according to 
the inverse square law, as is physically necessary, and their angular distribution 
is given by |u(0,¢,a’)|?, Further, the magnitude, P’ say, of the momentum of 
these scattered particles is equal to kh, since the exponential in (12) must be of 





the form e’”’"/", so that their energy is equal to 
PP one 
Sy = Opp = B ~ Hela!) = He(0°) — He(0") + p” /2m, 
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with the help of (8) and (10). This is just the energy of an incident particle, namely 
p°”/2m, reduced by the increase in energy of the scatterer, namely H,(a’)—H,(a°), 
in agreement with the law of conservation of energy. For values a’ of the a’s such 
that k? is negative there are no scattered particles, the total initial energy being 
insufficient for the scatterer to be left in the state a’. 

We must now evaluate u(0, ¢, a’) for a set of values a’ for the a’s such that k? is 
positive, and obtain the angular distribution of the scattered particles belonging to 
scatterers in state a’. It is sufficient to evaluate wu for the direction @ = 0 of the pole 
of the polar co-ordinates, since this direction is arbitrary. We make use of Green’s 
theorem, which states that for any two functions of position A and B the volume 
integral [(AV?B — BV?A) dx taken over any volume equals the surface integral 
J{(AOB/On—BOA/On) dS taken over the boundary of the volume, 0/On denoting 
differentiation along the normal to the surface. We take 


Ae eet. B= (rO¢a'|1) 


and apply the theorem to a large sphere with the origin as centre. The volume 
integrand is thus 


e7 tkr cos °V?(rOdba'|1) _ (rOpa'|1)V7e~*? 08? 
_ eee Ve 4 k*) (rOga’|1) = e7 tkr cos 8 I 


from (7) or (11), while the surface integrand is, with the help of (12),* 
—ikr cos 0 / = / —ikr cos 0 
e on (rOgpa’|1) — (rAda Il) ane 


= ikr cos, = ! etkr L 4 e*’k cose ikr cos @ 
fo ee r 





= iku/r - (1 + cos 6)et? e089) 


with neglect of r~2 Hence we get* 
2a Tw 
aa dx =, ao f r’sin6 dO -iku/r - (1+ cos6)e'—es9) 
0 0 


Qn 2 
ae | dé | dy - u(0,4,a")(2 — y)e*™ 
0 0 


where y = 1 — cos@, the volume integral on the left being taken over the whole 
of space. The right-hand side becomes, on being integrated by parts with respect 
to ¥," 


[iw uo, 4,0')22— Ye] a, ‘dye al, 8,02—9) } 
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The second term in the {} brackets is of the order of magnitude of r~', as would be 
revealed by further partial integrations, and may therefore be neglected. We are 
thus left with 


20 
tea dr = -2 f d¢ u(0, ¢, a’) = —4Aru(0, ¢, a’), 
0 


giving the value of u(0, ¢, a’) for the direction 0 = 0. 
This result may be written* 


u(0, ¢, a’) = —1/47 - een dx, (13) 
since P’ = kh. If the vector p’ denotes the momentum of the scattered electrons 
coming off in a certain direction (and is thus of magnitude P’), the value of u for 
this direction will be* 


u(6, b, a’) = —1/40 7 Los aa: 


as follows from (13) if one takes this direction to be the pole of the polar 
co-ordinates. This becomes, with the help of (9),* 


u(O’, d, a’) = —m/2rh? - // eT (PO/* de (zal|V|c%a) dx? eter )/h 


—2nmh(p'a"|V|p°a°), (14) 


when one makes a transformation from the co-ordinates x to the momenta p of 
the particle, using the transformation function (36) of Chapter VI. The single 
letter p is here used to denote the three components of momentum. 

The density of scattered particles belonging to scatterers in state a’ is now 
given by |u(6, ¢, a’)| /r2 Since their velocity is P’/m, the rate at which 
these particles appear per unit solid angle about the direction of the vector p’ will 
be* P’/m- |u(6, , a’)|?2, The density of the incident particles is, as we have seen, 
unity, so that the number of incident particles crossing unit area per unit time is 
equal to their velocity P°/m where P® is the magnitude of p° Hence the effective 
area that must be hit by an incident particle in order to be scattered in a unit solid 
angle about the direction p’ and then belong to a scatterer in state a’ will be* 


P'/P®. |u(O, ¢, a’)|? = 40?m7h? P'/P?- |(p'a"|V |p°a®) |’. (15) 


This is the scattering coefficient for transitions a® — a’ of the scatterer. 


It depends on that matrix element (p'a’|V|p°a°) of the perturbing energy V 
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whose column p’a° and whose row p'a’ refer respectively to the initial and final 
states of the unperturbed system, between which the scattering transition process 
takes place. The result (15) is thus in some ways analogous to the result (19) or (20) 
of Chapter IX, although the numerical coefficients are different in the two cases, 
corresponding to the different natures of the two transition processes. 


58. Solution with the p-Representation 


The result (15) for the scattering coefficient makes a reference only to that 
representation in which the momentum p is diagonal. One would thus expect 
to be able to get a more direct proof of the result by working all the time in 
the p-representation, instead of working in the x-representation and transforming 
at the end to the p-representation, as was done in 857. This would not at first sight 
appear to be a great improvement, as the lack of directness of the x-representation 
method is offset by its greater ‘Anschaulichkeit’, it being possible to picture 
the square of the modulus of the x-representative of a state as the density 
of a stream of particles in process of being scattered. The x-representation 
method has, however, other more serious disadvantages. One of the main 
applications of the theory of collisions is to the case of photons as incident particles. 
Now a photon is not a simple particle but has a polarization. It is evident from 
classical electromagnetic theory that a photon with a definite momentum, 7.e. one 
moving in a definite direction with a definite frequency, may have a definite state of 
polarization (linear, circular, &c.), while a photon with a definite position, which is 
to be pictured as an electromagnetic disturbance confined to a very small volume, 
cannot have any definite polarization. These facts mean that the polarization 
observable of a photon commutes with its momentum but not with its position. 
This results in the p-representation method being immediately applicable to 
the case of photons, it being only necessary to introduce the polarizing variable 
into the representatives and treat it along with the a’s describing the scatterer, 
while the x-representation method is not applicable. Further, in dealing with 
photons it is necessary to take the relativity variation of mass with velocity 
into account. This can easily be done in the p-representation method, but not 
so easily in the x-representation method. 

Equation (4) still holds when the relativity variation of mass with velocity is 
taken into account for the particle, but W is now given by 


Wie snre +P? Swe +p + Ne + pe (16) 


instead of by (2). Written in terms of p-representatives, equation (4) becomes 


{BE — H,(a’) — W}(pa'1) = 7 / (pa"|V|p"a”) dp" (p"a"[0), 
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W being here understood as a definite function of p,, py, p. given by (16). This may 
be written 


{W’ —W}(po'|1) = 5° [oaivp'e” dp" (p"a"|0), (17) 
where W' = E — H,(a") (18) 


and is the energy required by the law of conservation of energy for 
a scattered particle belonging to a scatterer in state a’. The p-representative 
of (Hj), obtained by transforming (6) with the transformation function (36) of 
Chapter VI, is 


(po|0) = h? 5.005 (p — p°) (19) 


as may be verified most easily by transforming this back to the x-representation. 
The 6(p — p®) means the product 


5(px — pe) 5(py — py)6(pz — PS), 


Equation (17) now becomes 


{W! —W}(pa'|1) = h? (pa'|V|p%a°). (20) 


We now make a canonical transformation from the Cartesian co-ordinates pz, 
Py, Pz Of p to its polar co-ordinates P, w, x, given by 


Dz = Pcosw, py = Psinwcos x, p, = Psinwsiny. 


If in the new represcntation we take the weight function P? sin w, then the weight 
attached to any volume of p-space will be the same as in the previous 
p-representation, so that the canonical transformation will mean simply 
a relabelling of the rows and columns of the matrices without any alteration of 
the matrix elements or of the set of numbers representing a state. Thus (20) will 
become in the new representation 


3 
{W' —W}(Pwxa'|1) = h? (Pwxa’|V| Pow x°a"), (21) 


W being now a function of the single variable P. 

The coefficient of* (Pwxa'|1), namely {W' — W}, is now simply a multiplying 
factor and not a differential operator as it was with the x-representation method. 
We can therefore divide out by this factor and obtain an explicit expression 
for (Pwyxa'|1). When, however, a’ is such that W’, defined by (18), is greater 





*Original:- (Pwxa'|) 
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than mc?, this factor will have the value zero for a certain point in the domain 
of the variable P, namely the point P = P’, given in terms of W’ by (16). 
The function (Pwya’'|1) will then have a singularity at this point. This singularity 
shows that* (Pwya’|1) represents an infinite number of particles moving about at 
great distances from the scatterers with energies indefinitely close to W’ and it is 
therefore this singularity that we have to study to get the angular distribution of 
the particles at infinity. 
The result of dividing out (21) by the factor {W’ — W} is 


3 
(Puwxal'|1) = h? (Puwya’|V| Pew °a°) /{W! — W} + X(w, x, a’)6(W' — W), (22) 


where \ is an arbitrary function of w, y and a’, since when an arbitrary multiple 
of 6(W’'—W) is multiplied by W’— W the product will vanish. To give a meaning 
to the first term on the right-hand side of (22), we make the convention that 
its integral with respect to P over a range that includes the value P’ is the limit 
when € — 0 of the integral when the small domain P’ —« to P’+€ is excluded from 
the range of integration. This is sufficient to make the meaning of (22) precise, 
since we are interested effectively only in the integrals of the representatives 
of states when the representation has continuous ranges of rows and columns. 
We see that equation (21) is inadequate to determine the representative (Pwxa‘|1) 
completely, on account of the arbitrary function occurring in (22). We must 
choose this A such that (Pwya‘|1) represents only outward moving particles, 
since we want the only inward moving particles to be those represented by (19). 

Let us take first the general case when the representative (Pwx|) of a state of 
the particle satisfies an equation of the type 


{W' = W }(Pwx|) = TLQsX) (23) 


where f(P,w,x) is any function of P, w and y, and W’ is a number greater 
than mc”, so that (Pwx|) is of the form 


(Pux|) = f(P,4,x)/{W" — W} + Aw, x)6(W" — W), (24) 


and let us determine now what must be in order that (Pw x|) may represent 
only outward moving particles. We can do this by transforming (Pwy]) 
to the x-representation, or rather the (r@y)-representation, and comparing it 
with (12) for large values of r. The transformation function is 


(r6¢|Pwy) ae ht eilp.)/h = h 2 et Pricosw cos -+sinw sin 8 cos(x—9)]/h 
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For the direction 0 = 0 we find 


(r0¢|) =h ? oh ap [ ax ft sin w dw e*P? s/h Pury) 


0 0 
w= 


baa, 
OO ee Oe 
Tw eiPrcosw/h fa) 
ie 
+ | do iPr] Ow —( ex}. 


The second term in the {} brackets is of order r~?, as may be verified by further 
partial integrations with respect to w, and can therefore be neglected. We are 
left with 


love) QT 
(r0@|) = “3 (Qnr)~ ye PdP | dx {e~Pr/*( Prry|) — e'P/*(POx|)} 
0 0 
= ihr} | PdP fee Pay) — e'Pr/*( POx|) } (25) 
0 


When we substitute for (Pwy|) its value given by (24), the first term in 
the integrand in (25) gives 


in? [° PaPe*P™IMF(P, m, x)/(W" —W) + Xm, x)5(W" — WY} (26) 


The term involving 6(W’ — W) here may be integrated immediately and gives, 
when one uses the relation PdP = W dW /c?, which follows from (16), 


ih-4c-*7} / W dW e*P!® \(r, y)6(W! — W) = ihe? IW X(a, yeh T/* 
me? 


To integrate the other term in (26) we use the formula i 
fuse - oP) [oo ap (28) 
: P'—P fae? 

with neglect of terms involving r~', for any continuous function g(P), 


which formula holds since [5° K(P)e~'?"/" dP. is of order r~ for any continuous 
function K(P) and since the difference 


GR) CR =) HOUR) CPP) 


is continuous. The right-hand side of (28), when evaluated with neglect of terms 
involving r~|, and also with neglect of the small domain P’—e to P’+e in the domain 
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of integration, gives 


oo 5-iPr/h ese co 4 i(P’—P)r/h 
oP) | oa = gp ger), ef 





wb —P lage ee 
pipin [°° sin(P’ — P)r/h “5 
= ig(P’)e Prt / ae ol dP =ing(P)e*P7/* (29) 


In our present example g(P) is 
g(P) = ihr Pf (P, m, x)(P’ — P)/(W' -W), 
which has the limiting value when P = P’, 
g(P) = ihr Pf (Pl, y)W/ Pe = ih cr |W’ f(P', 1, x). 


Substituting this in (29) and adding on the expression (27), we obtain the following 
value for the integral (26) 


her W'{—nf (Pl mx) — im x)pe thr’ (30) 
Similarly the second term in the integrand in (25) gives 
her W'{—1 f (Pi 0, x) — iA(0, x) pet?/” (31) 


The sum of these two expressions is the value of (r0¢|) when r is large. 
We require that (r0¢]|) shall represent only outward moving particles, and hence 
it must. be of the form of a multiple of e’”’"/". Thus (30) must vanish, so that 


Ax, x) = —iaf(P; 7 x). (32) 


We see in this way that the condition that (r@¢|) shall represent only outward 
moving particles in the direction 6 = 0 fixes the value of \ for the opposite direction 
§ = 7. Since the direction 6 = 0 or w = 0 of the pole of our polar coordinates is 
not in any way singular, we can generalize (32) to 


A(w, X) = —inf(P; Wx), (33) 


which gives the value of A for an arbitrary direction. This value substituted in 
(24) gives a result that may be written 


(Pwyl) = f(Pw,x){1/(W! — W) — ixd(W’ — W)}, (34) 


since one can substitute P’ for P in the coefficient of a term involving 6(W’—W) as 
a factor without changing the value of the term. The condition that (Pwy]|) shall 
represent only outward moving particles is thus that it shall contain the factor 


{1/(W' — W) — ind(W’ — W)}. (35) 
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With \ given by (33), expression (30) vanishes and the value of (r0¢@|) for 
large r is given by expression (31) alone, thus 


(r0¢|) = —20h~2c-?r-!W' f (Pi 0, v)eP7/* 


This may be generalized to — (r0¢|) = —2ah-3e-?r!W' f (Pl w, x)etP"/" 
giving the value of (r@¢|) for any direction 6 & @ in terms of f(P! w,x) for the 
same direction labelled by w & x. This is of the form (12) with 


u(0,@) = —20h-4c?W' f (Pi w, x) 


and thus represents a distribution of outward moving particles of momentum P’ 
whose number is 





2 pl 4 2 ‘pl 

lu? = (PoP (36) 
per unit solid angle per unit time. This distribution is the one represented by 
the (Pwx|) of (34). 

From this general result we can infer that, whenever we have a representative 
(Pwx|) representing only outward moving particles and satisfying an equation of 
the type (23), the number per unit solid angle per unit time of these particles is 
given by (36). If this (Pwx|) occurs in a problem in which the number of incident 
particles is one per unit volume, it will correspond to a scattering coefficient 
of amount 


If(Pi wl (37) 


It is only the value of the function f(P,w,x) for the point P = P’ that is 
of importance. 
If we now apply this general theory to our equations (21) and (22), we have 


f(P, wx) = 8 (Puxal[V|Pew?x?a), 
Hence from (37) the scattering coefficient is* 
(407?h?W°W'P'/c*P®) - |(P’wxa!|V] Pow???) |” (38) 


If one neglects relativity and puts W°W’/ci= m2 this result reduces to 


the result (15) obtained in the preceding section by means of Green’s theorem. 
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59. Dispersive Scattering 


We shall now determine the scattering when the incident particle is capable of 
being absorbed, that is, when our unperturbed system of scatterer plus particle 
has closed stationary states with the particle absorbed. The existence of these 
closed states for the unperturbed system will be found to have a considerable effect 
on the scattering for the perturbed system, and indeed an effect that depends very 
much on the energy of the incident particle, giving rise to the phenomenon of 
dispersion in optics when the incident particle is taken to be a photon. 

We use a representation for which the basic kets correspond to the stationary 
states of the unperturbed system, as was the case with the p-representation of 
the preceding section. These stationary states are now the states ~(p’a’) for which 
the particle has a definite momentum p’ and the scatterer is in a definite state 
a’, together with the closed states, ~, say, which form a separate discrete set. 
We shall assume that these states are all independent and orthogonal, so that 
our representation is of the usual orthogonal type. This assumption is probably 
not justifiable when the particle is an electron or atomic nucleus, since in this case 
for an absorbed state w, the particle will still certainly be somewhere, so that 
one would expect to be able to expand wv, in terms of the eigen-w’s wW(2'a’) of 
x, y, z and the a’s, and hence also in terms of the ~(p’a’)’s. On the other hand, 
when the particle is a photon it will no longer exist for the absorbed states, 
which are then certainly independent of and orthogonal to the states v(p’a’) 
for which the particle does exist. Thus the assumption is valid in this case, which 
is an important practical one. 

The representative of a state will now consist of a diserete set of numbers (k]) 
referring to the fundamental states yy, together with the three-dimensional 
continuous ranges of numbers (p’a’|) referring to the w(p’a’), there being one 
such range for each set of values a’ for the a’s. Similarly the matrices representing 
observables will now contain discrete rows and columns labelled by k together with 
continuous ranges labelled by (p, a). Thus, for example, the matrix representing V, 
the perturbing energy, will have elements (k’|V|k”), (k’|V |p"a"), (p'a’|V|k”) and 
(p'a'|V |p"). 

Since we are concerned with scattering, we must still deal with stationary 
states of the whole system, which will still be given by an equation of the type (3). 
We shall now, however, have to work to the second order of accuracy, so that 
we cannot simply use the first-order equation (4). The exact equation (3) gives, 
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when written in terms of representatives, 


(W'— WHE!) = ic (pa! |V [p"") dp" (p"a"|) + Spa |V LR) i") 
- (39) 
{BE Fi}(bl) = SD c (k|V pa") dp! (pa|) + So (kIV [R)(e"), 


al’ ki’ 


where W’ is given by (18) and EF, is the energy of the stationary state v, of 
the unperturbed system. If we suppose the exact ~(H’) to be expressed as the sum 
of w(H6), a first-order correction 1, a second-order correction ~2, and so on, thus 


WA") = W(Ao) + 41+ yot---, 


the r-th-order correction will be given in terms of the (r — 1)-th by 
(E — Ho)» = Vip. 


Thus its representative (pa’|r), (k|r) will be given by 


{W'-W}(pa'|r) => i (pa! |V |p") dp" (p"a"|r—1) + So(pa'|V |") (Kr = 1) 


all kk” 


{E — E,}(k|r) fi (k|V |p’a") dp" (p"al"|r—1) + S$ (kIV|R')(&" Ir 1). 
kk" 
(40) 
For r = 1 these equations are just the generalization of (17) when there exist 
absorbed states y,. The unperturbed stationary state ~(p°a°) will now be 
represented by 


(pa|0) = h?5qq05(p—p), (KIO) = 0, (41) 

instead of merely by (19), so the first-order correction will be given by 
{W' — W}(pa'|1) = h? (pa’|V|p°a°) (42) 
{E — Ex}(k|1) = h2(k|V|[p°a°). (43) 


We may assume that the matrix elements (k’|V|k”) of V_ vanish, 
since these matrix elements are not essential to the phenomena under investigation, 
and if they did not vanish it would mean simply that the absorbed states 
vp, had not been suitably chosen. We shall further assume that the matrix 
elements (p’a’|V|p’a”) are of the second order of smallness when the matrix 
elements (k'|V |p"a"), (p'a’|V|k") are taken to be of the first order of smallness. 
This assumption will be justified for the case of photons in Chapter XII. We now 
have from (43) and (42) that (k|1) is of the first order of smallness, provided E 
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does not lie near one of the discrete set of energy-levels E;,, and (pall) is of 
the second order. The value of (pa|2) to the second order will thus be given, 
from the first of equations (40), by 


{W' — W}(pa'|2) = h? S(pa'|V|k")(k"[V [p°a®) /{B — Ex} 


The total correction of the second-order, arising partly from (pa|1) and partly 
from (pa|2), therefore satisfies 


{W'-W}{(pa'|1)+(pa'|2)} = he {o=' as n'a Ave (Beh 


This equation is of the type (23), provided a’ is such that W’ > mc?, which means 
that a’ as a final state for the scatterer is not inconsistent with the law of 
conservation of energy. We can therefore infer from the general result (37) that 
the scattering coefficient is 


2 
An? h?W°W'P'’ 
ct P9 


(p'a"|V|k)(K|V |p°a®) 
E— Ex 








(pial |V|p°a®) + S> 


k 


(44) 








The scattering may now be considered as composed of two parts, a part that 
arises from the matrix element (p'a‘|V|p°a®) of the perturbing energy and a part 
that arises from the matrix elements (p’a’|V|k) and (k|V|p°a®°). The first part, 
which is the same as our previously obtained result (38), may be called the true 
scattering. The second part may be considered as arising from an absorption of 
the incident particle into some state k, followed immediately by a re-emission in 
a different direction. The fact that we have to add the two terms before taking 
the square of the modulus denotes interference between the two kinds of scattering. 
There is no experimental way of separating the two kinds, the distinction between 
them being only mathematical. 


60. Resonance Scattering 


Suppose the energy of the incident particle to be varied continuously while 
the initial state a° of the scatterer is kept fixed, so that the total energy E varies 
continuously. The formula (44) now shows that as E approaches one of the discrete 
set of energy-levels F;,, the scattering becomes very large. In fact, according to 
formula (44) the scattering should be infinite when E is exactly equal to an Ex. 
An infinite scattering coefficient is, of course, physically impossible, so that we can 
infer that the approximations used in deriving (44) are no longer legitimate when 
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E is close to an E;. To investigate the scattering in this case we must therefore 
go back to the exact equations (39) and use a different method of approximating 
to their solution. 

Let us take one particular EF, and consider the case when F is close to it. 
The large term in the scattering coefficient (44) now arises from those elements of 
the matrix representing V that lie in row & or in column k, 7.e. those of the type 
(k|V|pa) or (pa|V|k). The scattering arising from the other matrix elements of V 
is of a smaller order of magnitude. This suggests that in our exact equations (39) 
we should make the approximation of neglecting all the matrix elements of V 
except the important ones, which are those of the type (pa’|V|k) or (k|V|pa’), 
where a’ is a state of the scatterer that has not too much energy to be disallowed 
as a final state by the law of conservation of energy. These equations then reduce to 


{W! — W}(pal |) = (pa!|V[&) (El) (45) 
{B- By}(k) = 0 i (kIV |pa’) dp (po! |) (46) 


the a’ summation being over those values of a’ for which W’ given by (18) 
is > mc’. These equations are now sufficiently simple for us to be able to solve 
exactly without further approximation. 

From (45) we obtain by division 


(pa’|) = (pa'|V|k) (Al) /{W! — W} + 44(W" — W) (47) 


We must choose A, which may be any function of the momentum p and a’, 
such that (47) represents the incident particles (19) together with only outward 
moving particles. [The right-hand side of (19), with a’ substituted for a, is actually 
of the form 46(W’—W), since the conditions a = a° and p = p? for this right-hand 
side not to vanish lead to W’ = E — H,(a’') = E — H,(a®) = W® and W = W® 
which together give W’ = W.]| Thus (47) must be 


(pa'|) = h26(p! — p®) + (pa'|V|k) (hI) {1/(W! — W) + ind(W'-W)}. (48) 

and from the general formula (37) the scattering coefficient will be* 
(4x? WW'P'/he'P®) - |(pla’|V|k)|° |(RI))?. (49) 
It remains for us to determine the value of (k|). We can do this by substituting 


for (pa’|) in (46) its value given by (48). This gives 


{E—By}(kl) =h? (HIV Ip%a®) + (RS fICeIV pa’ PW) —ind(W'—-W)} dp 


3 
2 


=h2(k|V|pa®) + (kl) {a — 2b}, 
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where a= > flee)? dp /(W' —-W) (50) 


and b= 0S f |(b|V|pe’)|°5(W" — W) dp 
> fff \(k|V|Pwxa’)|? 5(W! — W)P? dP sinw dw dx 


- > Pe // |(k|V|P’wyo’)|? sinw dw dy. (51) 
Thus (k|) = h2(k|V|p°a°) /{E — E, — a + id}. (52) 


Note that a and 6 are real and that b is positive. 
This value for (k|) substituted in (49) gives for the scattering coefficient 
Anh? W°W’'P’ |(p'a'|V|k)|” (RIV pa?) |” 
ci po (E—E,—a2 +2 





(53) 


One can obtain the total effective area that the incident particle must hit in order 
to be scattered anywhere by integrating (53) over all directions of scattering, 
i.e. by integrating over all directions of the vector p’ with its magnitude kept 
fixed at P’, and then summing over all a’ that are to be taken into consideration, 
i.e. for which W’ > mc. This gives, with the help of (51), the result 


Anh2W°  b|(k|V|p°a®)|? 
e2P0 (R- E,—a)?+b? 





(54) 


If we suppose F to vary continuously through the value E;, the main variation 
of (53) or (54) will be due to the small denominator (H— E,—a)’+b”. If we neglect 
the dependence of the other factors in (53) and (54) on £, then the maximum 
scattering will occur when FE has the value FE; + a and the scattering will be half 
its maximum when F differs from this value by an amount b. The large amount 
of scattering that occurs for values of the energy of the incident particle that 
make FE nearly equal to E; give rise to the phenomenon of an absorption line. 
The centre of the line is displaced by an amount a from the resonance energy of 
the incident particle, z.e. the energy which would make the total energy just Ex, 
while the quantity b is what is sometimes called the half-width of the line. 


61. Emission and Absorption 


For studying emission and absorption we must consider non-stationary states of the 
system and must use the perturbation method of 852. To determine the coefficient 
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of spontaneous emission we must take a state for which the particle is initially 
absorbed, so that the representative of the state is then 


(Al)=1, (pal) = 


and determine the probability that at some later time the particle shall be on 
its way to infinity with a definite momentum. The method of §54 can now 
be applied. From the result (28) of that section we see that the probability per 
unit time per unit range of w and x of the particle being emitted in any direction 
w’, x’ with the scatterer being left in state a’ is* 


2n/A- |(k|IV|W'w!x’a’)/’, (55) 


provided, of course, that a’ is such that the energy W’, given by (18), of the particle 
is greater than mc”. For values of a’ that do not satisfy this condition there 
is no emission possible. The matrix element (k|V|W‘w'y‘a’) here must refer to 
a representation in which W, w, y and a are diagonal with the weight function 
unity. The matrix elements of V appearing in the three preceding sections refer to 
a representation in which p,, p,, pz are diagonal with the weight function unity, 
or P, w, xy are diagonal with the weight function P?sinw. They would thus 
refer to a representation in which W, w, x are diagonal with the weight function* 
dP/dW-P? sinw = WP/c?-sinw. Thus the matrix element (k|V|W'w'y‘a’) in (55) 
is equal to* (W’P’/c? - sinw’)? times our previous matrix element (k|V|W'w'x‘a’) 
or (k|V|p’a’), so that (55) is equal to 


ee 
he 


The probability of emission per unit solid angle per unit time, with the scatterer 
simultaneously dropping to state a’, is thus 


—— sinu’ |(k|V|p’a \/P : 


QT uit i 
h 


To obtain the total probability per unit time of the particle being emitted 
in any direction, with any final state for the scatterer, we must integrate (56) 
over all angles w’, y’ and sum over all states a’ whose energy H,(a’) is such that 
H,(a’) + mc? < Ex. The result is just 2b/h, where 6 is defined by (51). There is 
thus this simple relation between the total emission coefficient and the half-breadth 
b of the absorption line. 

Let us now consider absorption. This requires that we shall study a state for 
which initially the particle is certainly not absorbed but is incident with a definite 





(kV [p’a’))’. (56) 
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momentum. Thus the initial representative of the state must be of the form (41). 
We must now determine the probability of the particle being absorbed after time 7’. 
Since our final state uv, is not one of a continuous range, we cannot use directly 
the result (28) of §54. If, however, we take 


(pa|)o = Saa0d(p — p®), (Al )o = 0, (57) 


as the initial representative of the state, the analysis of §852 and 54 is still 
applicable as far as equation (25) and shows us that the probability of the particle 
being absorbed into state vw, after time T’ is 


2 |(k|V|p°a®)|* [1 — cos{(E, — E)T/h}]/(Ex — E)? 


This corresponds to a distribution of incident particles of density h~°, owing to 
the omission of the factor h2 from (57), as compared with (41). The probability of 
there being an absorption after time 7’ when there is one incident particle crossing 
unit area per unit time is therefore 


ewe ie Pe. (kV |p°a®)|° [1 — cos{(E, — E)T/h}|/(E, — E)*. (58) 


To obtain the absorption coefficient we must consider the incident particles not 
all to have exactly the same energy W° = E — H,(a°), but to have a distribution 
of energy values about the correct value E, — H,(a°) required for absorption. 
If we take a beam of incident particles consisting of one crossing unit area per 
unit time per unit energy range, the probability of there being an absorption after 
time T will be given by the integral of (58) with respect to E. This integral may 
be evaluated in the same way as (26) of §54 and is equal to* 


An?h?W°T /c? PP - |(k|V|p°a®)|”. 


The probability per unit time of an absorption taking place with an incident beam 
of one particle per unit area per unit time per unit energy range is therefore* 


An? h?W?/c? P® - \(k|V|poa°)]’, (59) 


which is the absorption coefficient. 

The connexion between the absorption and emission coefficients (59) and (56) 
and the resonance scattering coefficients calculated in the preceding section should 
be noted. When the incident beam does not consist of particles all with the same 
energy, but consists of a unit distribution of particles per unit energy range crossing 
unit area per unit time, the total number of incident particles with energies near 
an absorption line that get scattered will be given by the integral of (54) with 
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respect to E. If one neglects the dependence of the numerator of (54) on E, 
this integral will, since 
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have just the value (59). Thus the total number of scattered particles in 
the neighbourhood of an absorption line is equal to the total number absorbed. 
We can therefore regard all these scattered particles as absorbed particles that are 
subsequently re-emitted in a different direction. Further, the number of particles 
in the neighbourhood of the absorption line that get scattered per unit solid angle 
about a given direction p’ and then belong to scatterers in state a’ will be given by 
the integral with respect to E of (53), which integral has in the same way the value 


4n*h?W°W'P' + 
c* Po b 
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This is just equal to the absorption coefficient (59) multiplied by the emission 
coefficient (56) divided by 2b/h, the total emission coefficient. This is in agreement 
with the point of view of regarding the resonance scattered particles as those that 
are absorbed and then re-emitted, according to which point of view the fraction 
of the total number of absorbed particles that are re-emitted in a unit solid angle 
about a given direction would be just the emission coefficient for this direction 
divided by the total emission coefficient, provided the absorption and emission 
processes are governed independently each by its own probability law. 


XI. SYSTEMS CONTAINING 
SEVERAL SIMILAR PARTICLES 


62. Symmetrical and Antisymmetrical States 


IF a system in atomic physics contains a number of particles of the same kind, 
e.g. a number of electrons, the particles are absolutely indistinguishable one from 
another. No observable change is made when two of them are interchanged. 
This circumstance gives rise to some curious phenomena in quantum mechanics 
having no analogue in the classical theory, which arise from the fact that in 
quantum mechanics a transition may occur resulting in merely the interchange 
of two similar particles, which transition then could not be detected by any 
observational means. A satisfactory theory ought, of course, to count two 
observationally indistinguishable states as the same state and to deny that any 
transition does occur when two similar particles exchange places. We shall 
find that such a theory can be developed in agreement with the principles of 
quantum mechanics. 

Suppose we have a system containing n similar particles. We may take 
as our dynamical variables a set of variables €, describing the first particle, 
the corresponding set €2 describing the second particle, and so on up to the set €, 
describing the n-th particle. We shall then have the €/s commuting with the €,s 
for r # s. (We may require certain extra variables, describing what the system 
consists of in addition to the n similar particles, but it is not necessary to mention 
these explicitly in the present chapter.) The Hamiltonian describing the motion 
of the system will now be expressible as a function of the &, £,..., €,. The fact 
that the particles are similar requires that the Hamiltonian shall be a symmetrical 
function of the &, &2,..., €n, te. it shall remain unchanged when the sets of 
variables €, are interchanged or permuted in any way. This condition must hold 
no matter what perturbations are applied to the system. 

We may take a representation with observables q,, q2,..., dn, diagonal, which 
are such that the qs are the values at time t of certain commuting dynamical 
variables describing the first particle, the qzs are the values at time ¢t of the 
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corresponding variables describing the second particle, and so on. We may further 
choose the phases of the representation in the same way for each of the particles. 
(This means, for example, that if a certain momentum p, describing the first 
particle is represented by —ihO/0Oq, the corresponding momentum p, describing 
the r-th particle must be represented by —ihO/0q,.) The representation will then 
treat all the particles on the same footing. The condition that the Hamiltonian H is 
symmetrical between all the particles may now be expressed by the condition that 
its representative (qiqg,...¢,|H|d/qo ...¢), or (q|H|q") for brevity, is symmetrical 
between all the q’s, i.e. that it remains unchanged if any permutation is applied 
to the q’’s and the same permutation to the q’’s. This condition may be expressed 
analytically thus, 


(7|H|q") = (Pq'|H|Pq"), (1) 
where P denotes any permutation of the numbers 1, 2,..., n and Pq’ denotes 
the set of numbers obtained by applying the permutation P to the suffixes of 


Gis Qs 29 Gp: 
Let (qiq,---d,|) or (q'|) be the wave function representing any state. It will 
satisfy the wave equation 


ins(a) = flea") aa" (a). (2) 


If we apply any permutation P to the variables q’ in (q’|) we shall obtain a function 
(Pq'|) satisfying 


* 0 / / " W " 
ths (Pal) = [rd | \q") dq” (q"\) 


= / (Pq |H|Pq") dq’ (Pq'l), 


since we can apply any permutation to the variables of integration q”’ in 
the intergrand without changing the value of the integral. With the help of (1) 
this becomes 


ins(Pall) = f (deta) da" (Pd), (3) 


which shows that (Pq’|) is a solution of the wave equation (2). Hence if we apply 
any permutation to the variables in a solution of the wave equation we obtain 
another solution. 

Suppose we take a state whose representative (q’|) at some particular time t is 
a symmetrical function of all the q’’s, so that 


(7'|) = (Pd'l) (4) 
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for any P. The right-hand sides of (2) and (3) are now equal, so that 


= all) =< 
This equation is the time derivative of (4) and shows that if (4) holds at one 
particular time it holds also at a slightly later time, and thus by induction it holds 
at all times. Thus if a wave function is initially symmetrical it always remains 
symmetrical. 
Similarly we may take a state whose representative (q'|) at some particular 
time is antisymmetrical, i.e. (qiq5...q),|) changes sign with interchange of any 
pair of q’’s. We shall then have 


Pq) 


(7'|) = £(Pq')), (5) 


the + or — sign being taken according to whether the permutation P is even or odd 
(z.e. according to whether P can be built up from an even or an odd number of 
simple interchanges). The same argument as before now shows that if a wave 
function is initially antisymmetrical it always remains antisymmetrical. 

Let us make a canonical transformation to a Q-representation which, like the 
original g-representation, treats all the particles on the same footing. This means 
that the Q’s consist of corresponding sets of observables Q1, Qo,..., Qn describing 
the first, second, ..., nth particle respectively and that the phases are chosen in 
the same way for each of the particles. The transformation function will now be 
of the form 


(Q1Q5--- Qld --- In) = (Qi) (Qslap) --- (Qnidn)s (6) 


in which each factor (Q/|q.) is the same function of its variables Qi & qj. 
This condition gives, if we denote (Q1Q)...Qi) |dig,-.-¢,) by (Q'q’) for brevity, 


(Q'Iq') = (PQ'|Pq), (7) 
for an arbitrary permutation P. The new representative of any state is given by 
= fea) ad ap. (8) 


From this equation we can deduce that 
(PQ) = | (Pld) dy (a) 
= / (PQ'|Pd’) dd (Pq'|) 
= f(@ia) ag (Pal) (9) 
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with the help of (7). Now if (q‘|) is symmetrical, so that equation (4) holds, 
the right-hand sides of (8) and (9) are equal. We then have (Q’|) = (PQ'), 
so that (Q’|) is also symmetrical. Similarly if (q'|) is antisymmetrical, (Q’]) 
is also antisymmetrical. Thus the property of the representative of a state 
of being symmetrical or antisymmetrical remains invariant under a canonical 
transformation. This invariance, together with the fact proved above that a wave 
function if initially symmetrical or antisymmetrical always remains so, shows that 
the property of being symmetrical or antisymmetrical is a property of the states 
themselves and not merely a property of their representatives. Thus we can talk 
about symmetrical and antisymmetrical states. 

The invariance and permanence of the symmetry properties of the states means 
that for some particular kind of particle it is quite possible for only symmetrical 
or only antisymmetrical states to occur in nature. Whether this is the case 
cannot he decided by any general theoretical considerations, but can be settled 
only by reference to special experimentally determined facts about the particles 
in question. For photons one can settle the question by making use of Planck’s 
radiation law. Only when one assumes the symmetrical states for photons does 
one get a statistical mechanics leading to Planck’s law for radiation in statistical 
equilibrium. This statistical mechanics is known as the Einstein-Bose statistics, 
as it was first introduced by Satyendra Nath Bose and Albert Einstein before the 
arrival of the modern quantum mechanics. 

For electrons we use the fact that, if we make the approximation of regarding 
the electrons in an atom as each moving in its own ‘orbit’ (i.e. as being each 
describable by its own wave function involving only its own variables), then no 
two electrons will ever be in the same orbit. This fact, which is known as 
Pauli’s exclusion principle, may be inferred from general experimental evidence 
on atomic structure. Let us see how to fit it in with the theory. If the wave 
functions representing the different orbits are 


(q'|a1), (q'la2),---; (q'lan) 


a wave function representing the whole atom will be given by the product 


(G,lo1)(Gla2) --- (dnlen) = (q'le) (10) 


say, for brevity. Other wave functions representing the same distribution of 
electrons over the various orbits may be obtained by applying any permutation 
to the a’s in (10). There will be altogether n! such wave functions, the general one 
being (q’/|Pa). Any linear combination of these wave functions will also represent 
the same electron distribution. One such linear combination is the sum 


S-(q|Pa), (11) 


P 
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which is symmetrical between all the q’’s. Another is 
>~ +(q|Pa), (12) 
P 


the+or—sign being taken according to whether P is an even or odd permutation, 
and this one is antisymmetrical. The antisymmetrical wave function (12) 
has the property that it vanishes identically if two of the a’s are equal. 
Hence if we assume that for electrons only antisymmetrical states occur, we shall 
get the result that there are no states with two electrons in the same orbit, which is 
just Pauli’s exclusion principle. This assumption is the only one we can make which 
will lead to Pauli’s exclusion principle. 

In this way we can see that for photons we must take the symmetrical 
states and for electrons the antisymmetrical states. These are special cases 
of an empirical rule, which appears to hold without exception, according 
to which only the symmetrical or only the antisymmetrical states occur 
according to whether the particles in question carry a charge of an even or 
an odd multiple of the electronic charge. When only the symmetrical or only 
the antisymmetrical states are allowed for a particular kind of particle, the theory 
can no longer make a distinction between two states which differ only through 
a permutation of the particles, so that the difficulties mentioned at the beginning 
of this section disappear. 


63. Permutations as Observables 


Let us now build up a general theory for a system containing n similar particles 
when states with any kind of symmetry properties are allowed, 7.e. when there 
is no restriction to only symmetrical or only antisymmetrical states. The general 
state now will not be symmetrical or antisymmetrical, nor will it be expressible 
linearly in terms of symmetrical and antisymmetrical states when n > 2. 

If P denotes any permutation and w any w~-symbol, we can give a meaning 
to Pw, the w-symbol obtained by operating on w with P. We define Pw to be 
the w-symbol whose representative is (Pq'|), obtained by applying the permutation 
P to the representative (q‘|) of vy. This Pw is independent of the representation 
used for defining it, as follows from equation (9). Further, the operation by which 
Pw is obtained from w is a linear one. Hence we can regard Pw as the product of 
an observable P with w, 7.e. we can regard the permutation P as an observable. 

There are n! permutations, each of which can be regarded as an observable. 
One of them, P; say, is the identical permutation, which is equal to unity. 
If ~ denotes a symmetrical state, we have 


Pp=w (13) 
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for any P, and hence a symmetrical w is an eigen-w of every permutation belonging 
to the eigenvalue unity. Similarly an antisymmetrical w~ is an eigen-w of every 
permutation belonging to the eigenvalue +1 according to whether the permutation 
is even or odd. The product of any two permutations is a third permutation and 
hence any function of the permutations is reducible to a linear function of them. 
Any permutation P has a reciprocal P~! satisfying PP~' = P-'P = P, = 1. 

A permutation P, like any other observable, can be represented by a matrix. 
Its g-representative (q'|P|q") will satisfy 


/ ('|Pla”) dq" a") = (Pal) 
and hence (q'|Plq") = 6(Pd' — ¢") (14) 
CaL a). (15) 


The 6 function in (14) or (15) denotes the product of n factors of the type 
d({Pq}, — ¢') or 6(q. — {P~'q"},) respectively. The conjugate complex of P 
is given by 


(q'|Pla”) = (q"|Plq’) = 5(q" — P7'7’) 
=(¢|P“|¢) 
from (15) and (14), so that Rear (16) 


Thus a permutation is not in general a real observable, its conjugate complex being 
equal to its reciprocal. 
Any permutation of the numbers 1, 2, 3,..., 2 may be expressed in the cyclic 
notation, e.g. with n = 8 
P, = (143)(27)(58)(6), (17) 


in which each number is to be replaced by the succeeding number in a bracket, 
unless it is the last in a bracket, when it is to be replaced by the first in 
that bracket. Thus P, changes the numbers 12345678 into 47138625. The type of 
any permutation is specified by the partition of the number n which is provided by 
the number of numbers in each of the brackets. Thus the type of P, is specified by 
the partition 8 = 3+2+2+1. Permutations of the same type, 7.e. corresponding to 
the same partition, we shall call similar. Thus, for example, P, in (17) is similar to 


P, = (871)(35)(46)(2). (18) 


The whole of the n! possible permutations may be divided into sets of similar 
permutations, each such set being called a class. The permutation P,; = 1 forms 
a class by itself. Any permutation is similar to its reciprocal. 

When two permutations P, and P, are similar, either of them P, may be 
obtained by making a certain permutation P in the other P,. Thus, in our 
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example (17), (18) we can take P to be the permutation that changes 14327586 
into 87135462, z.e. the permutation 


P = (18623)(475). 
We then have the algebraic relation between P, and P, 
Pa PPPs (19) 


To verify this, we observe that the product P,w of P, with any w is changed into 
P,w if one applies the permutation P to the P, in the product but not to the w. 
If we multiply the product by P on the left, we are applying this permutation to 
the whole w-symbol P,w and thus to both the P, and the w, so that we must insert 
another factor P~' between the P, and the w, giving us PP,P~! to equate to Pw. 
An alternative proof consists in noting that when the permutation P is applied to 
the representative 6(P,q' — q") of P,, it gives 6(PP,q' — Pq”) or 6(PP,P~'q' —q"), 
which is just the representative of PP,P7!. 

Equation (19) is the general formula showing when two permutations P, and 
P, are similar. Of course P is not uniquely determined when P, and P, are given, 
but the existence of any P satisfying (19) is sufficient to show that P, and P, are 
similar. 


64. Permutations as Constants of the Motion 


A permutation P may be considered as an observable at each instant of time and 
may therefore be considered as a dynamical variable. Let us see how P varies 
with the time. The fact that the Hamiltonian is symmetrical leads at once to 
the equation 

Pe AP, (20) 


as may be verified by a similar argument to that used for equation (19), 
or alternatively by a direct application of the matrix representatives. 
Thus from (14) 


(q'|PH|q") = / 6(Pq' —q") dq” (q"|H\q") = (Pq'|Alq") 
and from (15) 
(q'|HP|q") = fein") aig” 6(q” _ Pe ¢') = HIP 9"), 


and the two right-hand sides are now equal from (1). Equation (20) shows that 
each permutation is a constant of the motion. The P’s are still constants when 
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arbitrary perturbations are applied to the system, provided the perturbing energy 
to be added to the Hamiltonian is symmetrical. Thus the constancy of the P’s 
is absolute. 

In dealing with any system in quantum mechanics, when we have found 
a constant of the motion a, we know that if for any state, a initially has 
the numerical value a’, then it always has this value, so that we can assign different 
numbers a’ to the different states and so obtain a classification of the states. 
The procedure is not so straightforward, however, when we have several constants 
of the motion a which do not commute (as is the case with our permutations P), 
since we cannot assign numerical values for all the a’s simultaneously to any state. 
Let us first take the case of a system whose Hamiltonian does not involve the time 
explicitly. The existence of constants of the motion a which do not commute is then 
a sign that the system is degenerate. We must now look for a function (@ of the a’s 
which has one and the same numerical value (’ for all those states belonging to one 
energy-level H’, so that we can use ( for classifying the energy-levels of the system. 
We can express the condition for 6 by saying that it must be a function of H, 
according to the general definition of a function of an observable, so that @ must 
commute with every observable that commutes with H, i.e. with every constant of 
the motion. If the a’s are the only constants of the motion, or if they are a set that 
commute with all other independent constants of the motion, our problem reduces 
to finding a function £ of the a’s which commutes with all the a’s. We can then 
assign a numerical value (’ for 6 to each energy-level of the system. If we ean find 
several such functions 3, they must all commute with each other, so that we can 
give them all numerical values simultaneously and obtain a complete classification 
of the energy-levels. When the Hamiltonian involves the time explicitly one 
cannot talk about energy-levels, but the 6’s will still give a useful classification for 
the states. 

We follow this method in dealing with our permutations P. We must find 
a function x of the P’s such that PyP~!—=y for every P. It is evident that 
a possible y is 5° P., the sum of all the permutations in a certain class c, 
i.e. the sum of a set of similar permutations, since Soar, - must consist of 
the same permutations summed in a different order. There will be one such xy 
for each class. Further, there can be no other independent y, since an arbitrary 
function of the P’s can be expressed as a linear function of them with numerical 
coefficients, and it will not then commute with every P unless the coefficients of 
similar P’s are always the same. We thus obtain all the y’s that can be used for 
classifying the states. It is convenient to define each xy as an average instead of 


a sum, thus 
Xe = n,' » te 
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where n, is the number of P’s in the class c. An alternative expression for x, is 


many PRE (21) 
P 
the summation being extended over all the n! permutations P. For each 
permutation P there is one y, x(P) say, equal to the average of all permutations 
similar to P. One of the y’s is x(P,) = 1. 
The constants of the motion 1, X2,---,; Xm obtained in this way will each have 
a definite numerical value for every stationary state of the system, in the case 
when the Hamiltonian does not involve the time explicitly, and also in the general 
case can be used for classifying the states, there being one set of states for every 
permissible set of numerical values y}, x5,..., x), for the x’s. Since the y’s 
are absolute constants of the motion, these sets of states will be exclusive, 
i.e. transitions will never take place from a state in one set to a state in another. 
The permissible sets of values y’ that one can give to the y’s are limited by 
the fact that there exist algebraic relations between the y’s. The product of any 
two X’S, XpXq, is of course expressible as a linear function of the P’s, and since 
it commutes with every P it must be expressible as a linear function of the y’s, 
thus 





XpXq = 1X1 + A2X2 + +++ + AmXm- (22) 
where the a’s are numbers. Any numerical values x’ that one gives to the .’s must 
be eigenvalues of the y’s and must satisfy these same algebraic equations. For every 
solution x’ of these equations there is one exclusive set of states. One solution 
is evidently yj, = 1 for every Xp, and this gives the set of symmetrical states 
satisfying (13). A second obvious solution is x/, = +1, the + or — sign being taken 
according to whether the permutations in the class p are even or odd, and this gives 
the set of antisymmetrical states. The other solutions may be worked out in any 
special case by ordinary algebraic methods, as the coefficients a in (22) may be 
obtained directly by a consideration of the types of permutation to which the y’s 
concerned refer. Any solution is, apart from a certain factor, what is called in group 
theory a character of the group of permutations. The y’s are all real observables, 
since each P and its conjugate complex P~' are similar and will occur added 
together in the definition of any y, so that the y’’s must be all real numbers. 

The number of possible solutions of the equations (22) may easily be 
determined, since it must equal the number of different eigenvalues of an arbitrary 
function B of the x’s. We can express B as a linear function of the y’s with 
the help of equations (22); thus 


B=)x1 box aes Di Xie (23) 


Similarly we can express each of the quantities B?, B?..., B™ as a linear function 
of the y’s. From these m equations, together with the equation y(P,) = 1, we can 
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eliminate the m unknowns 1, Y2,---; Xm, obtaining as result an algebraic equation 
of degree m for B, 


Beta Bee! Bl eee ge a) 


The m solutions of this equation give the m possible eigenvalues for B, 
each of which will, according to (23), be a linear function of b,, bz,..., bm, whose 
coefficients are a permissible set of values vy}, V5,---, Xj,- These sets of values 
y’ thus obtained must be all different, since if there were fewer than m different 
permissible sets of values y’ for the y’s there would exist a linear function of 
the x’s every one of whose eigenvalues vanishes, which would mean that the linear 
function itself vanishes and the y’s are not linearly independent. Thus the number 
of permissible sets of numerical values for the x’s is just equal to m, which is 
the number of classes of permutations or the number of partitions of n. This 
number is therefore the number of exclusive sets of states. 

The properties of the P’s which are not properties of the y’s will only describe 
the degeneracy of the states, in the case of a system whose Hamiltonian does not 
involve the time explicitly. If ~ denotes any stationary state, f(P)Wv, where f(P) 
is any function of the permutations, will denote another stationary state belonging 
to the same energy-level, except when it vanishes identically. By expanding 
f(P)w in terms of a complete set of independent stationary states belonging to 
this energy-level, we get a representation of f(P) and thus of each P. In this way 
we see that, if we obtain a matrix representation of all the P’s consistent with 
each of the y’s being a certain number x’, then the number of rows and columns 
of the matrices will be the degree of degeneracy of the states in the exclusive 
set x’, i.e. the number of independent states belonging to each energy-level. 
This degeneracy is an essential one and cannot be removed by any perturbation 
that is symmetrical between all the similar particles. The states ~ and f(P)w are 
observationally indistinguishable, since any observation that can actually be made 
must consist in measuring an observable that is symmetrical between the similar 
particles and therefore commutes with f(P). This remark applies also when 
the Hamiltonian involves the time explicitly. 


65. Determination of Energy-levels 


Let us apply the perturbation method of 851 and make a first-order calculation of 
the energy-levels in the case when the Hamiltonian does not involve the time 
explicitly. We suppose that for our unperturbed states each of the similar 
particles has its own ‘orbit’, represented by a wave function (q’|@) involving 
only the co-ordinates q’ of this one particle. We shall have altogether n orbits, 
one for each particle, which we assume for the present to be all different, 
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and label aj, a2,..., Mp. The wave function representing an unperturbed state 
of the whole system will then be the product (10). If we apply an arbitrary 
permutation P, to the a’s, we shall obtain another wave function 


(dilar)(qlas) .- (Grlat) = (q'|Paa) (24) 


representing another unperturbed state with the same energy. ‘There are thus 
altogether n! unperturbed states with this energy, if we assume there are no other 
causes of degeneracy. According to the method of §51 when the unperturbed 
system is degenerate, we must consider those elements of the matrix representing 
the perturbing energy V that refer to two states with the same energy, i.e. those of 
the type (P.a|V|P,a) where P, and P, are two permutations of the a’s. These will 
form a matrix with n! rows and columns, whose eigenvalues are the first-order 
corrections in the energy-levels. 

It is necessary in the present discussion to distinguish between the two kinds of 
permutations, those of the q’s and those of the a’s. The essential difference between 
them can perhaps be seen most clearly in the following way. Let us consider 
a permutation in the general case, say that consisting of the interchange of 2 
and 3. This may be interpreted either as the interchange of the objects 2 and 3 
or as the interchange of the objects in the places 2 and 3, these two operations 
producing in general quite different results. The first of these interpretations 
is the one we have been using up to the present, the objects concerned being 
the q’s in the representative of a state. A permutation with this interpretation can 
be applicd to an arbitrary function of the q’s. A permutation with the second 
interpretation has a meaning, however, when applied to a function of the q’s 
only if each of the q’s has a definite specifiable place in the function. This is 
not the case for a general function of the q’s, but it is the case for any of 
the n! functions of the type (24), the place of each q being specified by the a 
with which it is bracketed. Any permutation applied to the q’s in given places 
now produces the same result as the reciprocal permutation applied to the a’s. 
A permutation of the q’s (i.e. one with the first interpretation), since it can be 
applied to any function of the q’s, 7.e. to the representative of any ~-symbol, may be 
regarded as an ordinary observable. On the other hand, a permutation of places 
or of the a’s can be considered as an observable only in a very restricted sense, 
since it has a meaning only when multiplied into a ~-symbol whose representative 
is one of the n! wave functions (24) or some linear combination of them. We denote 
such a permutation of the a’s, considered as an observable in this restricted sense, 
by the symbol P* 

We can form algebraic functions of the observables P® which will be other 
observables in the same restricted sense. In particular we can form y(P°), 
the average of all P®’s similar to P®. This must equal x(P,), the average of 


a 
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the similar permutations of the q’s, since the total set of all permutations of 
a given type must evidently be the same whether the permutations are applied 
to the objects g or to the places a. 

If we set up arbitrarily a one-one correspondence between the q’s and the a’s, 
as is done automatically when we label both the q’s and the a’s by the numbers 
1, 2, 3,..., m, as in (10), then, if we have any permutation of the q’s, we can give 
a meaning to this same permutation of the a’s. This meaning is such that 


(gla) = (Pq|Pa). 


In this equation we can apply a permutation P, to the a’s on both sides, which will 
give us 
(q|Paa) = (Pq Phe), (25) 


an equation which shows us the connexion between permutations of the q’s and 
those of the a’s when applied to the wave function (24). 

The matrix (P,a|V|P,a), which we must now study, may be obtained from 
the matrix (q'|V|q’) representing V by a canonical transformation, in which 
the transformation functions are just (q'|P.a), the wave function (24), and its 
conjugate complex (P,a|q'), provided these functions are properly normalized. 
Thus 


(P,a|V|P,a) = / (Pald’) dq (a'|V a") da” (q'|Pa): (26) 
Again, for arbitrary P, 
(P,Pa|V|P,Pa) = / i (P,Palq) dq’ (¢'|V a") dq" (q’|P,Pa) 
7 i | (P,Pa|Pq') dq’ (Pq |V|Pq") dq” (Pq" |P,Pa), 


when we apply the permutation P to the variables of integration q’ and q”. With 
the help of (25), this reduces to 


(PsPalV|PsPa) = ff (Psald) da! (Pd|VIPa') da" (a'|Pia). (27) 
Now since V is symmetrical between all the particles, we must have 
(q'|V|q") = (Pa'|V| Pq"), 
like (1), and hence, comparing (26) and (27), we obtain 


(P,a|V|P,a) = (P,Pa|V|P,Pa). (28) 


65. Determination of Energy-levels 205 


Let (Pa|V|a) = Vp for brevity. Then, taking P = P;' in (28), we obtain 
(Pia|V|Pra) = (PaPs*|Via) = Vp, po 


Thus the general matrix element (P,a|V|P,«) depends only on the ratio P,P; ', 
and of the total of (n!)? matrix elements there are only n! different ones. 
The coefficient of any Vp in this matrix will be a matrix, each of whose elements 
is 0 or 1, the 1 occurring when 


(P,a|V|P,) = Vp, 


i.e. when P,P, ' = P. But this matrix, multiplied into any wave function (q|P,q), 
gives the result (q|P.a) with P,P, ' = P, i.e. it gives the result (q|PP,a), so that 
it is precisely the matrix representing the observable P® or the permutation P 
applied to the a’s. Thus the whole matrix (P,a|V|P,a) is equal to the matrix 
representing }),VpP*, where the summation is over all the n! permutations P, 
and we can put 


V= > Ver™ (29) 


This formula shows that the perturbing energy V is equal to a linear function 
of the permutation observables P® with numerical coefficients Vp. It is, of course, 
only an approximate formula, as it holds only with neglect of those matrix 
elements of V that refer to two different energy-levels of the unperturbed system. 
It can, however, be used for the calculation of the energy-levels in the first 
approximation, and is very convenient for this purpose as the expression )> » VpP° 
is easily handled. This expression, it should be remembered, is an observable only 
in the restricted sense mentioned above, but this sense is sufficiently general for 
equation (29) to be valid with neglect of those matrix elements of V referring to 
two different energy-levels of the unperturbed system. 

As an example of an application of (29) we shall determine the average energy 
of all those states arising from a given state of the unperturbed system that belong 
to one exclusive set. This requires us to calculate the average eigenvalue of V when 
the y’s have specified numerical values y’. Now the average eigenvalue of P® cquals 
that of P*P°P°~ for arbitrary P* and thus equals that of n!-1 )>p. P*P2P? |, 
which is y‘(P*) or x'(P,). Hence the average eigenvalue of V is 5°, Vpx‘(P). 
A similar method could be used for calculating the average eigenvalue of any 
function of V, it being only necessary to replace each P® by y(P) to perform 
the averaging. 

The number of energy-levels in an exclusive set y = y’ that arise from a given 
state of the unperturbed system is equal to the number of eigenvalues of (29) that 
are consistent with the equations y = y’. This number is the number of rows 
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and columns in a representation of the P’s in which each y = x’, which number, 
from the result at the end of the preceding section, is just the degree of degeneracy 
of the states in this set. 

The modifications required in the theory when the orbits aj, ao,..., An 
of the undisturbed system are not all different may easily be made. 
Suppose, for example, that a; and a2 are the same. Then the permutation Py, 
that causes an interchange of a; and a2 must equal unity. Only functions of 
the P®’s that commute with Pj, now have a meaning. This, however, is sufficient 
for us to be able to follow out the same sort of argument as before, and obtain 
a result of the same form (29). The term in the summation in (29) that involves 
the permutation Pf, now does not occur, since it could be added on to the term 
involving the identical permutation P;*. For the remaining terms, any two terms 
P®* and P; must have the same coefficient if the permutations P® and P,;* can 
be obtained from one another by the interchange of a, and ag. This results in 
>> p VpP* commuting with Pf, and thus having a meaning. The condition Pf, = 1 
will impose restrictions on the possible numerical values y’ that the y’s can have 
and will reduce the number of characters. 


66. Application to Electrons 


Let us now consider the case when the similar particles are electrons. This requires, 
according to Pauli’s exclusion principle discussed in §62, that we take into account 
only the antisymmetrical states. It is now necessary to make explicit reference 
to the spin properties of the electrons. The effect of the spin on the motion of 
an electron in an electromagnetic field is not very great. There will be additional 
forces on the electron due to its magnetic moment, requiring additional terms in 
the Hamiltonian. The spin angular momentum will not have any direct action 
on the motion, but it will come into play when there are forces tending to rotate 
the magnetic moment, since the magnetic moment and angular momentum are 
constrained to be always in the same direction. These effects are all small, 
however, of the same order of magnitude as that of the relativity variation of 
mass with velocity, so there would be no point in taking them into account in 
a non-relativity theory. The importance of the spin lies not in these small effects 
on the motion of the electron, but in the fact that it gives two internal states to 
the electron, corresponding to the two possible values of the spin component in any 
assigned direction, which causes a doubling in the number of independent states of 
an electron moving in a given field. This fact has far-reaching consequences when 
combined with Pauli’s exclusion principle. 

Let us take a representation in which the diagonal observables gq, describing the 
r-th electron are its three Cartesian co-ordinates x, y, z, and the z-component o, 
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of its spin vector o, which was introduced in §43. The representative of a state 
will now be 
(By 05s c+ 5 Pin G iO oy 0a) = ee |): (30) 


the single variable x being written instead of x, y, z and the suffix z being dropped 
from o,’s that occur in representatives. The exclusion principle requires that (30) 
shall be antisymmetrical in the x’s and o’s together, 7.e. if any permutation is 
applied to the x’s and also to the o’s, (30) must remain unchanged or change sign 
according to whether the permutation is even or odd. In symbols 


(x;0|) = 4(Pz,Pa}) (31) 


for any permutation P. Thus even if we neglect the spin forces in the Hamiltonian, 
we must take the spin variables into account in order to determine what states are 
allowed by the exclusion principle. 

If the theory of the three preceding sections is applied directly to the case 
of electrons, it will not give anything of interest, since all the allowed states 
are eigenstates of any permutation belonging to the eigenvalue +1. We may, 
however, consider permutations P which operate on the x-variables alone in 
the representative of a state, and apply our theory to these. Such permutations 
may also be considered as observables. Further, they are also constants of 
the motion when we neglect the terms in the Hamiltonian that arise from 
the spin forces, which neglect results in the Hamiltonian not involving the spin 
observables o. Hence with these permutations P we can again introduce the y’s, 
equal to the average of all of the P’s in each class, and assert that for any 
permissible set of numerical values ,’ for the y’s there will be one exclusive set 
of states. Thus there exist these exclusive sets of states for systems containing 
many electrons even when we restrict ourselves to a consideration of only those 
states that satisfy Pauli’s principle. The exclusiveness of the sets of states is now, 
of course, only approximate, since the x’s are constants only so long as we neglect 
the spin forces. There will actually be a small probability for a transition from 
a state in one set to a state in another. 

From (31) we obtain 

PP? =+1, (32) 


where P denotes any permutation which operates on the x-variables and P? 
the same permutation operating on the o-variables in the representative of 
a state. There is thus a simple connexion between the P’s and P®’s, which 
means that instead of studying the observables P we can get all the results 
we want, e.g. the characters y’, by studying the observables P?. The P?’s are 
much easier to study on account of the fact that the o variables in the wave 
function have domains consisting each of only the two points 1 and —1, which are 
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the two eigenvalues of each o,. This fact results in there being fewer characters y’ 
for the group of permutations of the o-variables than for the group of general 
permutations, since it prevents a function of the variables a1, 02,..., On from 
being antisymmetrical in more than two of them. 

The study of the observables P’ is made specially easy by the fact that we can 
express them as algebraic functions of the observables o. Consider the quantity 


O12 = 3{1 + (01,02). 
With the help of equations (42) of §43 we find readily that 





(o1, o2)° = (C1n02n + O1yO2y + O20 92) = 3 — 2(01, 02), (33) 
and hence that Or.” = 4{1 4+ 2(01, 02) + (01, 02)"} = 1. (34) 
Again, we find O12012 = 3{O12 + O22 — 101202 + 101y02z} 
F220 12 = ${ 02x + O1x + 10 1yO2x — 101z02y} 
and hence Oj2017 = Ooz,0 12. 


Similar relations hold for oj, and 01, so that we have 
O20) = 02012 

or O.010;5 = Oo. 

From this we can obtain with the help of (34) 


-1 
O120 20315 = 04. 


These commutability relations for Oj. with o,; and a2 are precisely the same as 
those for Pf,, the permutation consisting of the interchange of the spin variables 
of electrons 1 and 2. Thus we can put 


Oo 
O12 = cPro, 


where c is a number. Equation (34) shows that c = +1. To determine which 
of these values for c is the correct one, we observe that the eigenvalues of Pf, 
are 1, 1, 1 & —1, corresponding to the fact that there exist three independent 
symmetrical and one antisymmetrical function of the two variables o1,, 22, 
namely, with the notation of §43, the three symmetrical functions fa(o1) fa(o2), 
fa(oi) fa(o2), fo(o1) fa(o2) + fe(o1) fa(o2), and the one antisymmetrical function 
fa(oi)fa(o2) — fa(o1)fa(o2). Thus the mean of the eigenvalues of Pf, is 34. 
Now the mean of the eigenvalues of (01,2) is evidently zero and hence the mean 
of the eigenvalues of Oj is 4. Thus we must have c = +1, and so we can put 


Py = 4${1+ (01,02)}. 
In this way any permutation P?’ consisting simply of an interchange can be 
expressed as an algebraic function of the o’s. Any other permutation P® can 


be expressed as a product of interchanges and can therefore also be expressed 
as a function of the o’s. With the help of (32) we can now express the P’s as 
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algebraic functions of the o’s and eliminate the P’’s from the discussion. We have, 
since the— sign must be taken in (32) when the permutations are interchanges and 
since the square of an interchange is unity, 


Piz = —${1+4+ (01, 02)}. (35) 


The formula (35) may conveniently be used for the evaluation of 
the characters \’ which define the exclusive sets of states. We have, for example, 
for the permutations consisting of interchanges 


2 
X12 = X(Piz) = -3 + a Dent 


If we introduce the observable s to describe the magnitude of the total spin angular 
momentum, 4>_.o, in units of h, through the formula 


2 5 eee 1 1 
§ a= (3) Gis 8) «). 
r t 


analogous to equation (12) of Chapter VIII, we have 


2 S (or, o:)= os C,, ¥«) — y (0,01) 





r<t : 
= 45% — 1 — 3n. 
2 2 
Hence ovegi las, 28. Se. a pds a 
X12 4 ae a ‘ (36) 


Thus x12 is expressible as a function of the observable s and of n the number of 
electrons. Any of the other y’s could be evaluated on similar lines and would have 
to be a function of s and n only, since there are no other symmetrical functions 
of all the o observables which could be involved. There is therefore one set of 
numerical values ,’ for the y’s, and thus one exclusive set of states, for each 
eigenvalue s’ of s. The eigenvalues of s are 


Ieee. Sen a". “bes: Fg 
grag, 3% 9, oh 9; 





the series terminating with 4 or 1. 

We see in this way that each of the stationary states of a system with several 
electrons is an eigenstate of s, the magnitude in units of h, of the total spin angular 
momentum $)_,.0,, belonging to a definite eigenvalue s‘. For any given s’ there 
will be 2s’ possible values for a component of the total spin vector in any direction 
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and these will correspond to 2s’ independent stationary states with the same 
energy. When we do not neglect the forces due to the spin magnetic moments 
these 2s’ states will in general be split up into 2s’ states with slightly different 
energies, and will thus form a multiplet of multiplicity 2s’ Transitions in which 
s’ changes, 7.e. transitions from one multiplicity to another, cannot occur when 
the spin forces are neglected and will have only a small probability of occurrence 
when the spin forces are not neglected. 

We can determine the energy-levels of a system with several electrons to the first 
approximation by using formula (29). If we consider only the Coulomb forces 
between the electrons, then the interaction energy V will consist of a sum of 
parts each referring to only two electrons, which will result in all the matrix 
elements Vp vanishing except those for which P is the identical permutation or is 
simply an interchange of two electrons. Thus (29) will reduce to 


V=V+>_VpePS, (37) 


r<s 


V,, being the matrix element referring to the interchange of orbits r and s. 
Since the P®’s have the same properties as the P’s, any function of the P°’s 
will have the same eigenvalues as the corresponding function of the P’s, so that 
the right-hand side of (37) will have the same eigenvalues as 


Vit >) VisPrs 


r<s 
” ia VU leeeey (38) 
r<s 
from (35). The eigenvalues of (38) will give the first-order corrections 


in the energy-levels. The form of (38) shows that a model which assumes a coupling 
energy between the spins of the various electrons, of magnitude —4V,.(0;, 05) for 
the electrons in the r and s orbits, would meet with a fair amount of success. 
This coupling energy is much greater than that of the spin magnetic moments. 
Such models of the atom were in use before the justification by quantum mechanics 
was obtained. 

If two of the orbits of our unperturbed system are the same, say the orbits a, 
and a, are the same, we must take only those eigenvalues of (37) that are consistent 
with Pf, = 1, or those eigenvalues of (38) consistent with Pig = 1 or Pg, = —1. 
This means we must take only those eigenvalues of (38) belonging to eigenfunctions 
that are simultaneously eigenfunctions of Pf, belonging to the eigenvalue —1, 
i.e. eigenfunctions that are antisymmetrical in a, and og. Thus we may say that 
the two electrons in the orbits a; and a2 have their spins antiparallel. The case of 
more than two orbits the same cannot occur with electrons. 


XH. THEORY OF RADIATION 


67. Theory of Einstein-Bose Assemblies 


IN Chapter X a theory was given of the scattering, absorption and emission 
of a particle by an atomic system. The interaction of the particle and atomic 
system was assumed to be describable by an interaction energy V appearing in 
the Hamiltonian, which interaction energy had to be small but was otherwise 
arbitrary. If we could determine the energy of interaction between a photon and 
an atom or molecule, we could apply the methods of Chapter X immediately to 
the case when the incident particle is a photon. In this way we could obtain 
a theory of the interaction of light with an atomic system. We cannot determine 
this energy of interaction directly from analogy with the classical theory, in the way 
we obtained the Hamiltonians for most of the systems dealt with up to the present, 
since the phenomenon of the interaction of a photon with an atom has no analogue 
in the classical theory. We must proceed in a more indirect way. We know that 
the interaction of an atom with a field of radiation can be described approximately 
by classical electrodynamics when the field of radiation consists of a large number 
of photons. Our method is therefore to assume an arbitrary interaction energy V 
between a single photon and the atom and then in terms of V to investigate 
the interaction of a large number of photons with the atom. By comparing 
this interaction with that given by classical electrodynamics we can then obtain V. 
Our problem now is thus to deal in general terms with the interaction of 
a large number of photons with an atom. This problem, it is important to observe, 
is a generalization of that of Chapter X, in spite of the fact that we then often 
considered a large number of incident particles. The incident particles of Chapter X 
were all independent and each had its own scatterer. In fact they were only 
introduced to help us to picture one actual incident particle interacting with one 
scatterer. We now have a large number of actual photons all interacting with 
the same atom. Also our photons are independent of one another since, even if 
there are no forces between them describable by an interaction energy, they are, 
as we saw in the preceding chapter, such that only states that are symmetrical 
between them occur in nature, i.e. they satisfy the Einstein-Bose statistics. 


va 
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Let us first consider the problem of an assembly of n similar systems of 
any kind that satisfy the Einstein-Bose statistics and are all perturbed by some 
external field of force. If we take a representation in which sets of observables 


G1; 92;---5 In describing the first, second,...last system respectively, are diagonal, 
the representative (qiq, ... ¢,|) of any state must be symmetrical in the variables 
di, Gs---, YU, Suppose the eigenvalues of any of the q’s, gq, say, are gq, q°), 
q®?),..., which we assume for definiteness to be discrete. These eigenvalues must 


be the same for each of the n systems, 7.e. they must be independent of r. 
(They will each be in general a set of numbers, consisting of an eigenvalue of 
each of the set of commuting observables q,.) If we now have any symmetrical 
function of the variables qj, q5,..., gj, each point in the domain of this function 
can be specified by nj, n, n4,..., the numbers of q'’s equal to gq, ¢?, q®..., 
respectively. The variables nj, n4, n3,... will do just as well as the variables 
di, %%,---, GU, so long as we are dealing only with symmetrical functions. 
Thus the representatives of states of our assembly satisfying the Einstein-Bose 
statistics may be expressed as functions of the variables nj, ni, n5,... instead of 
the variables q), q5,..., 7. This change is effectively a canonical transformation to 
a new representation in which the rows and columns of the matrices are labelled by 
the observables n1, n2, n3,... which observables are the numbers of systems with 
q’s equal to g, q®), g@)... respectively, or, as we may say, the numbers of systems 
in the states g), q), g®.... Since the new observables n1, no, ng,... are functions 
of the q, d2,;---; Um (non-analytic functions, it is true), the transformation is of 
the trivial kind consisting essentially of a relabelling of the rows and columns and 
the only change to be made in the representative of a state will be that arising 
from the change in the weights of the different points of its domain. To determine 
this change we use the condition 


Dol(rina.-- DIP = D7 [aide dnl)”. 
N1,N2Q,... G15 925 +++)4n 


from which we can infer that 


(rina. IP = D7 Maiga--- dnl), (1) 


the summation in (1) being over all values of the q’s such that n, of them are 
equal to g™, nz equal to g°’, and so on. The number of terms in the summation 
in (1) is n!/(nq! neg! n3!...) and they are all equal, on account of (qq2...n|) being 
symmetrical. It is thus clear that we must take 


(nynz...|) = [n!/ny!nelng!... B(qrge-- - dnl). (2) 


The question of interest now is to express the Hamiltonian of the system in 
terms of the new observables n1, n2, n3,.... We can do this by writing down its 
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representative in the q-representation and transforming to the n-representation. 
Since the transformation is of an unusual kind, the most convenient way of 
making it is to write down the whole Schrédinger equation and to transform that. 
This Schrédinger equation is 


/ 


, O ot / Jo 
th, (qida--- Gnl) = So (ne... MmlH la ---d) (id ---dl)- (3) 


/ / 
94 925-+5%n 


The Hamiltonian H is of the form 
#= 300, 


where U,. is the energy associated with the r-th system, consisting of its proper 
energy together with its interaction energy with the external field of force, and is 
a function of the dynamical variables of the r-th system only. The representative of 
U,. in the q,-representation will be (q/.|U,|q/), which will be a matrix independent 
of r, i.e. the same for each of the n systems. Its elements may also be 
written (q¢@|U|q®) or Ug for brevity. The representative of U, in the complete 
q-representation will be 


Boop / We W\ / " 
(M19 -- InlUr |G % - sec) = (Gr Ur 1d: ) Oa of! Sahay - - Og al! Saha sdyy  Oghalt: 


This makes the Schrédinger equation (3) reduce to 


ee: 
ih (qaa--- dal) =D | (arlUrldr)(qnae-- dnl) 


ie 


+ S> GlUrlg) (G92. dr—adedrst Onl) | = (4) 
WFQ 


the terms arising from the diagonal matrix elements of H being separated from 
the non-diagonal ones for convenience later. 

If we now make the transformation to the n-representation, using equation (2), 
equation (4) becomes 


ih (mins l= Soe lUrlar)(mine |) 


a 
$32 SO (mg FD)/Ma.]*(GrlUrlde) (Maing... Mg, 1... Mg +1... 1). 
Y qp#4r 
(5) 
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after removal of the factor [ny!ng!ng!... /n!]? throughout. The sum >>.(q,|U;|¢-) 
in (5) means a sum of terms each of the type (q|U|q™) or Uga, the number of 
times this type occurs being the number of q’s that equal q‘, which is just ng. 
Thus this sum is equal to 5), maUaa. Again, the double sum 5°, Doh: in (5) 
consists of terms each of the type [(nmp + 1)/nal? Ua(ning...Ng —1...m+1...]) 
with b £4 a. The number of times a ape occurs is equal to the number of ways 
of choosing r and q/. such that q, = q and q. = q). This is just n, the number 
of ways of choosing r such that g, = q@, since there is always just one way of 
choosing q. = g). Equation (5) thus reduces to 


O 
the ( N1N2 « le ce aa (nN. sl) 
+5 °° ni my +1)? Uay(nyng...Ng—-1... nyt] ...]), 
a b#a 
which may be written 
ine-( 2s +1 )2U an ( 1 lined ee 6) 
(nin n3(n — 6q ablNyNg...NMq—1...N ahs 
ae UB b b p(M1 Ng b 


if by (nyn2...Ng—1...ny+1...|) when b = a we understand simply (n1n2...N¢.. . |) 

The eigenvalues of each of our new dynamical variables n1, n2,... are 
the integers 0, 1, 2, 3,.... They are thus the same, apart from the factor h, 
as those of the action variable J in the problem of the simple harmonic 
oscillator, when the arbitrary constant in this action variable is chosen as in 
equation (22) of §41. Hence each n, is a dynamical variable of the same nature as 
the action variable of a simple harmonic oscillator and we can introduce an angle 
variable w, canonically conjugate to it, or rather we can introduce e””* and e~"*, 
Corresponding to equations (24) of §41 we shall have 


| “ng = (Na — Le ° | (7) 


enna le 4 


e 


Also we have that es, e~’’s, and ng commute with e””, e~®, and ny for b # a. 
The new observables e””* & e~“”* are defined by their matrix representatives in 
a representation in which n, is diagonal, like the e’” & e~” of §41. From the form 
of these matrix representatives it follows that when e~’’* is multiplied into 
a w-symbol whose representative is (njng...mq...|), the representative of 
the product is 
(nyn2...Ng t1...]), 
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and when e’”« is multiplied into this ~-symbol, the representative of the product is 


(nyn2...Ng—1...|), for ng > 1, 
0, for i,= 0: 


This means that when e~”* and e’”’« are multiplied into ~-symbols, they are 
equivalent to the operations of substitution of n,+1 and n,—1 for nq respectively, 
the second substitution being understood to give the result zero for ng = 0. 

We can now express the operator on the right-hand side of (6) explicitly in 
terms of the n, and their canonical conjugates wy. It is, in fact, just 


7 a : hy 
) n? (ny +1—- day Uae °e vom 
a,b 


= ye nie™2U ay (ny + 1227 (8) 


a,b 


with the help of (7). This quantity (8) is our Hamiltonian expressed in 
terms of the new dynamical variables ng and wy. The U, are, of course, 
just numerical coefficients. 

We can easily generalize this result to apply to a more general type of 
Hamiltonian, namely, that describing the perturbation of the assembly, not by 
an external field of force, but by some other atomic system, which we shall call 
for definiteness the perturber, the reaction of the assembly on the perturber 
being taken into account. We now have to introduce some more dynamical 
variables, 6 say, to describe the perturber. Our Hamiltonian will be of the form 


H=Hp+)_U,, (9) 
where Hp is the Hamiltonian that describes the perturber alone and U,. is 
the energy associated with the r-th system of the assembly, consisting of its proper 
energy plus its interaction energy with the perturber. Hp will be a function 
of the 6’s only and U, will be a function of the variables describing the r- 
th system and also the 6’s. We can express the new sum )>.U, in terms of 
the ng, Wa variables by the same method as before and the result will be of the same 
form (8), with the difference that the Ujy’s will no longer be numbers but will be 
functions of the 6’s. The definition of U,» will now be that its representative in 
the ¢-representation, the ¢’s being any complete set of commuting observables 
taken out of the (’s, is 


(¢'Uawle”) = (Cg |UIG"a), (10) 


the matrix on the right being the representative of U; in the representation in 
which q, and ¢ are diagonal. We shall still have U,» commuting with the n’s 
and w’s. 
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It is possible to express any function of the dynamical variables that is 
symmetrical between all the particles in terms of the new variables n, and w,. 
The transformation may be conveniently carried out by considering the function 
of the dynamical variables to be the Hamiltonian for some dynamical system and 
then writing down the Schrédinger equation and transforming that. The general 
case has been considered by Jordan.‘ 


68. Discussion of Einstein-Bose Assemblies 


In the preceding section we saw how the Hamiltonian describing an Einstein-Bose 
assembly, or more generally any symmetrical function of the dynamical variables 
of all the systems of the assembly, can be expressed in terms of variables n, & wa, 
analogous to the action and angle variables of a simple harmonic oscillator. 
This shows that an Einstein-Bose assembly is dynamically equivalent to a set of 
simple harmonic oscillators, there being one oscillator corresponding to each of 
a complete set of independent states of a system of the assembly, the quantum 
number of the oscillator corresponding to the number of systems in the state. 

We may replace the set of simple harmonic oscillators by a train of waves, 
each Fourier component of the waves being dynamically equivalent to a simple 
harmonic oscillator. Thus our Einstein-Bose assembly is dynamically equivalent 
to a system of waves. This provides us with a complete reconciliation between the 
corpuscular and wave theories of radiation. We may regard radiation either as an 
assembly of photons satisfying the Einstein-Bose statistics or as a system of waves, 
the two points of view being consistent and mathematically equivalent. 

We can gain a greater insight into the connexion between the systems of 
an Ejinstein-Bose assembly by considering the limiting case when the number 
of systems in each state is large, i.e. when the n’s are large. We introduce 
the observable* 

Eq = (Mg + lke = ean} , 
whose conjugate complex is 
E, = e*(ng + 1)? = nde, 


This €, is the analogue of p — 7q for the harmonic oscillator, apart from numerical 
coefficients. We have 

ean = Na + 1 

Eaka = Na (11) 


t Jordan, Pascual. (1927). Uber Wellen und Korpuskeln in der Quantenmechanik. Zeitschrift 
fiir Physik, 45(11-12), pp. 766-775. doi:10.1007 /bf01329554 
*fin original the first equation omits the final $ index} 
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and thus bn; a AP = 1. (12) 


We can now express the Hamiltonian (8), describing the perturbation of 
the assembly by an external field of force, in terms of the €,’s and their conjugate 
complexes, the result being 
H = D1 EUasks. 
a,b 


The equations of motion for the €,’s are 


ih, = ¢,H — He, = S- Oars, (13) 
b 


with the help of (12) and the condition that €, commutes with & and €, when b ¥ a. 
When the n,’s are large, the €,’s are also large and we may neglect the unity 
on the right-hand side of (12). With this approximation our observables? €, & €, 
all commute with each other and may be counted as numbers. The equations 
of motion (13) now become ordinary differential equations between numbers. 
These equations are identical to the Schrédinger equation for a single one of 
the systems perturbed by the external field of force, the set of numbers €, playing 
the part of the Schrédinger function (q|) and U,» being the representative of 
the Hamiltonian. If this Schrédinger function is normalized to n, it may be 
considered to represent an assembly of n independent systems in the way discussed 
in §56. The interpretation of the Schrédinger function, namely the interpretation 
of (|? as the number of systems in state q, now corresponds exactly 
to the interpretation of the €?s provided by equation (11). We thus have 
the result that an assembly of a large number of similar systems is described 
by the same equations, whose solutions are to be interpreted in the same way, 
whether the systems are independent or satisfy the Einstein-Bose statistics. 

Since an assembly of independent systems and an assembly satisfying 
the Einstein-Bose statistics are two physically different things, it may seem strange 
that they are both to be described by the same set of equations, even though 
we are restricting ourselves to the limiting case of a large number of systems 
in the assembly. The solution of this paradox lies in the fact that there is 
an essential difference between the mathematical treatments of the two assemblies, 
in spite of the similarities pointed out above, as may be seen from the following 
discussion. An assembly of independent systems is described as completely as 
quantum mechanics allows when we are given the number of systems in each state. 
The modulus of the Schrédinger function (q|) is then determined for each 
state q™, but not its phase. This phase has no physical meaning. We must 
average over all values of this phase if it appears in the result of any calculation. 





{the original has a as the qualifying indices} 
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On the other hand, for an assembly satisfying the Einstein-Bose statistics, the €,’s 
are observables and their phases as well as their moduli are of physical importance. 
An Einstein-Bose assembly is not described as completely as it might be unless 
the phases of the €,’s are given as well as their moduli. 

When we do not take the limiting case of a large number of systems, 
the differences between the Einstein-Bose assembly and independent assembly 
are greater. To obtain the equations for the Einstein-Bose assembly from those for 
the independent assembly we must apply a sort of quantization to the Schrédinger 
function, 7.e. we must replace the numbers composing the Schrédinger function by 
observables satisfying definite commutability relations. 


69. Application to Photons 


In applications of the above theory it is convenient to take the q’s to be constants 
of the motion for an unperturbed system, so that the q‘”’s label the stationary 
states of the unperturbed systems and the n,’s are the numbers of systems in 
the stationary states. In the case of photons this means we must take the q’s 
to be the three Cartesian components of momentum together with a variable 
specifying the polarization, which variable may be taken to be the direction of 
the electric vector for a linearly polarized photon. The polarization variable will 
now continually occur in our calculations along with the momentum. For brevity 
this polarization variable will usually not be explicitly mentioned but will be 
understood. Thus when we say a certain photon has a definite momentum, it is 
to be understood that it has also a definite polarization, and the set of three 
variables pz, py, pz (which may be abridged to p) specifying this momentum 
is to be understood as containing a fourth variable specifying the direction of 
the electric vector. Again, when it is said that an integration is made over all 
values of the variables p,, py, pz, a Summation over the two independent states of 
polarization is implied as well. 

We can apply the theory at the end of 867 to the interaction of a number of 
photons with an atom, the atom being the perturber. The energy U for a photon 
will consist of its proper energy hy together with its interaction energy with 
the atom, V say. Hence 

Us = hVaOab + Vis 


Vv, being the frequency of a photon in the stationary state a. The Viv’s, 
like the U,y’s, will be functions of the dynamical variables of the atom. The total 
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Hamiltonian, given by (9) and (8), may now be written 





H=Hp+ S- nre*U a(n + 1)2e 
a,b 
= Hp+ S- Nahe + Se neva ab(Na + 1)be~tv (14) 
a a,b 
= Hp+ Hp+ Ho, 


Hr being the total proper energy of the radiation and Hg the total 
interaction energy. 

Now photons have the peculiarity that they can be created and annihilated, 
as happens whenever one of them is emitted or absorbed by an atom, 
while our theory of the Einstein-Bose assembly has been built up on the basis 
of the conservation of the total number of systems. We can, however, reconcile 
our theory with this peculiarity of the photons by assuming a zero state for 
the photons, in which they have no momentum and energy and are not physically in 
evidence. We can now say that when a photon is absorbed or emitted, it jumps into 
or out of this zero state respectively, and can in this way preserve the constancy of 
the total number of photons. Since there is no limit to the number of photons that 
may be emitted, we must assume the number in the zero state to be infinite, 
i.e. No = oo. This makes the angle variable conjugate to np a constant of 
the motion, since 


ae 
ih—e™? = e'%0 H — He™™0 
dt | 
= (eng — noe”) (hv + Voo) 
4 Jeng = ne | So eVon(ne ot 1)ee~™ 
b£0 
+ 3 n2Va0 [e” (no + 1)3 — (no + 1)8e™| eww 
a#0 
= 0, 


since vy and Voy vanish and the quantities in square brackets | | are of order no? 

In order that the Hamiltonian (14) may remain finite when no is infinite, 
Vio and Voq must be infinitely small. We shall suppose that they are infinitely 
small in such a way that their products with n3 are finite and we shall put 


i 


Vi0(mo + 1)2e" #0 a 


= 15 
Voantei a Va, ( ) 
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V, and V, being two new conjugate complex dynamical variables. We may count V, 
and V, as functions only of the dynamical variables describing the atom, like Vo 
and Voa, since the other factors on the left-hand sides of (15) are constants of 
the motion (no being effectively constant since changes in no are small compared 
with no) and have no physical significance. The interaction energy Hg in (14) may 
now be written 


Ho-= S“{Vankeivs + Va(na + 1)2e™*} + S- Vapn2et* (ny + 1)2e7™, (16) 


a,b 


the values a = 0, b = 0 being understood to be excluded from the summations here. 

A photon has a continuous range of stationary states and not a discrete set, 
since its components of momentum may have any values from —oo to oo. 
We therefore have to change the sums in (16) into integrals. To do this accurately 
would not be very easy, since it would mean dealing according to quantum 
mechanics with a dynamical system with continuously many degrees of freedom, 
which would require a new scheme of notation and a new mathematical technique. 
We are, however, interested in the interaction energy (16) mainly with regard to 
the limiting case of large n’s, when classical mechanics may be assumed to apply 
for the radiation, since we wish to compare the interaction energy in this case with 
that provided by classical electromagnetic theory and thus obtain expressions for 
the Vis and V,vs. In this limiting case the passage from sums to integrals is 
quite easy. 

Let o, denote the number of states of the photon (with a particular 
polarization) per unit of momentum space about the momentum p,. We assume 
0, to be large, but an arbitrary function of pz, and investigate the limit of (16) 
when o, is made infinite. The number of photons (with a particular polarization) 
per unit of momentum space about the momentum p, is 


Na = Naa, 


provided n, varies in some roughly continuous way from one state to the next. 
Let (p'|V |p”) be the matrix* representing the interaction energy V for one photon 
in the ordinary p-representation for that photon. This ordinary p-representation 
differs from the one we have used up to the present in this chapter, in which V is 
represented by V,,, only through the weight function. In the former representation 
the weight attached to a small domain 6p, of momentum space is just dp,, while in 
the latter it is the number of discrete states in this domain, which is o,0pzq. 





*The matrix elements of this matrix are actually functions of the dynamical variables 
describing the atom, like the V,,’s, and not numbers, but this does not invalidate the argument. 
The representation is an ‘incomplete’ one, the representatives being defined in terms of those of 
a complete one by equations like (10). 
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The weight function is thus changed by a factor og. The rule at the end of §24 
now shows that the matrix elements in the two representations are connected by 


(pV |p) = Vao(oa)* (17) 
Similarly the matrix elements (p'|V|0), (0|V |p’), referring to transitions into or out 
of the zero state, are connected with V, and V, by 
(p|V|0) = Vao#, —(0|V |p) = Vac? 


We can now express the interaction energy (16) in the limiting case of large n’s, 
when the n’s may be assumed to commute with the w’s, in the form 


oe O/V|0)ntet + (OV [p)nte™* po, 


+e DV [p)ndndetee—) gg 5 


2 ; {(p|V|oynbe= + (OV |p)nber™"} dda 


a 4 4(WwWa—w 
+ [fo Vp nbnpetme—”) dp, dp, (18) 


in the limit 0 + oo. The fact that the o’s have disappeared from this result 
justifies our method of dealing with a continuous range of states as a limiting case 
of a discrete set. 


70. Determination of the Interaction Energy 
between a Photon and Atom 


We shall now determine the matrix elements (p|V|0), (0|/V |p) and (p|V |p) 
by comparing (18) with the classical expression for the interaction energy between 
an atom and a field of radiation. For simplicity we shall suppose the atom 
to consist of a single electron moving in an electrostatic field of force. The field of 
radiation may be described by the 4-vector potential, which is to a certain extent 
arbitrary and may be chosen so that its time component vanishes. The field is 
then completely described by the magnetic potential A,, A,, Az or A. The change 
that the field causes in the Hamiltonian describing the atom is now, as explained 
at the beginning of §48, 


gm {(P + SA) ~ Pf 





f 
c 
ed 
+ 





A’. (19) 
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This is the classical interaction energy, which is to be compared with (18). The A 
that occurs here ought really to be the value of the magnetic potential at the 
point where the electron is momentarily situated. It is, however, a good enough 
approximation if we take this A to be the magnetic potential at some fixed point 
in the atom, such as the nucleus, provided we are not dealing with radiation whose 
wave-length is small compared with the dimensions of the atom. 

To make the comparison between (18) and (19) we must first resolve the field 
of radiation into plane progressive waves. The electric and magnetic fields of 
one of these waves, whose frequency is v and whose direction is specified by the 
momentum p of the associated photons, are of the form 


&, cos|(x, p)/h + 2nvt + yp, KH, cos|(x, p)/h + 2arvt + yl], 


the amplitudes &, and #, being vectors of equal length that are perpendicular to 
the direction of motion and to each other. The total electric and magnetic fields 
are expressible as Fourier integrals of the form 


E= [% cos|(x, p)/A + 2nvt + 7] dp, 
KH = [% cos|(x, p)/A + 2avt + 7p] dp, 
é,, A, and y, being definite functions of the momentum p. 


We must obtain the distribution of energy of this field over the various Fourier 
components. At time t = 0 we have* 


[erav= [E80 ap ap’ | cos{(x,p)/h + 79] cos(x,p'/h-+ yy)] de 
=[[En&y) dp dp'- $h°{cos(Yp + Yp)5(p + p’) + cos(Yp — Yp')6(p — p’)}, 


the integration with respect to x here being similar to that with respect to q 
performed in §35. Thus 


Je draak [Erb c08(Yp + Y-p) +&} dp. 


Similarly 


if HO dx —4e i {( Ib, FC») 608(p +») + FE2} dp. 





** replaces ‘.’ 
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On account of the connexion between the vectors &, and #4, we have é* = H? 
and also (€,,€_,) = —(#,, #_,). Hence the total energy is* 


1/sn- [ (6 +30%) ar = n/n: |B dp, 


and the energy per unit of momentum space is h?/87 - E°. This may be equated 
to hvpnp, the 7 having the same meaning as in the preceding section. Thus 


é° = 8th "vpn. 


The vector potential A may be expressed as a Fourier integral in the same way 
as 6 and #. We shall have 


A=-— i A, sin|(x, p)/h+ 2nvt+ 7p] dp, (20) 
the vector A, being in the same direction as &, and having its length given by 
2 2 
Gg 2c 
Be a ee SN RP Se Sy 21 
P (=<) ,. they, |” ay) 


At the origin A will have the value 
A= -[a, sin[2avt + Yp| dp = [» COS W, dp, 


W, being an angle variable of the same nature as those occurring in (18). This value 
for A substituted in expression (19) for the interaction energy gives 


e/c | A,) COS Wp dp + dime. fF (By, Agr) €08 008 ty dp dp’ 


4 2 
e {2 1. ga ee 1 43 ; 
7 h\x nel ones oo amh? ae me Opp! py! cOe Mp COS ly dp dp, 
Dp PY’ p! 


(22) 





with the help of (21), where «, is the component of x in the direction of A, or &, 
and 6, is the angle between the vectors A, and A,. 

If we write (22) in terms of e’” and e~”’ instead of cosw and compare it 
with (18), we obtain 


1 


(PIV 0) = OV) = Fae 
u 


e2 











(p|V |p’) = 


COS Or. 
2 44 pp 
2amh pV sy 
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We also find that there are certain terms in (22), namely those involving 
exp i(W, + Wp) or exp —i(w, + wW,), which have no corresponding terms in (18). 
This discrepancy shows the inadequacy of the assumption that the Hamiltonian 
describing the interaction of an assembly of photons with an atom is of the form (9). 
The extra terms in (22) would correspond to transitions in which two photons are 
simultaneously absorbed or emitted and the possibility of such transitions requires 
a more complicated interaction energy than that assumed in (9). The physical 
effects of these terms are, however, small and unimportant, and so we shall 
neglect them. 

Equations (23) now give the interaction energy V between a single photon 
and the atom. This interaction energy cannot conveniently be expressed explicitly 
in terms of dynamical variables. We can get a complete representation of V_ by 
introducing a Heisenberg representation for the variables describing the atom. 
If the different stationary states of the atom alone are denoted by a’, a”,..., 
we shall have 

1 





€ . 
(pV 0") = (Oe! Va”) = $l) 
24 
(p'a"|V |p"a"’) = ee le COS Ory Ooo! « ( ) 
2amh? vive, ae 


Each p here is, as before mentioned, to be understood as including not only 
the three Cartesian components of momentum of the photon but also a polarization 
variable specifying a direction of electric force. The matrix element (a‘|%,|a”) is 
the component of the vector (a’|x|a”) in the direction of the electric force specified 
by p and similarly 6,» is the angle between the directions of electric force specified 
by p’ and p”. 


71. Emission, Absorption and Scattering of Radiation 


We can now determine directly the coefficients of emission, absorption 
and scattering of radiation by substituting in the formulae of Chapter X the values 
for the matrix elements given by (24). For the case of emission we can use 
formula (56) of Chapter X. This shows that for an atom in a state a’ the probability 
per unit time per unit solid angle of its spontaneously emitting a photon and 
dropping to a state a” of lower energy is 

2 


4n?>WPle 1 ; 
h C2 h (Qrv)3 (a’|z,|a") : (25) 
Now the energy and momentum of a photon of frequency v are 


W = hp, P= pyife. 
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Again from the Heisenberg law (48) of Chapter VI 
(a'|zpla”) = 2riv(a’, a")(a'|zpla""), 


v(a’, av) being the frequency connected with transitions from state a’ to state a”, 
which in the present case is just the frequency v of the emitted radiation. 
These results substituted in (25) make the emission coefficient reduce to 


(2nv)? 
he 





\(a’|exp|a”)/’. (26) 


To obtain the rate of emission of energy per unit solid angle we must multiply this 
by hv. If we now integrate over all solid angles, we shall obtain for the total rate 
of emission of energy 


I(a’Jexla")|’. (27) 





wl 
—— 
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This is in agreement with expression (50) of Chapter VI and_ justifies 
Werner Heisenberg’s assumption for the interpretation of his matrix elements. 

In the same way the absorption coefficient, given by formula (59) of Chapter X, 
becomes for photons 


4n*h?2W e 1 ies " J 8r3v / NY {2 
CP i (nv Hepla ) 7 Cc I(a tale ) 








This absorption coefficient refers to an incident beam of one photon crossing unit 
area per unit time per unit energy range. If we take one per unit frequency range 
instead of energy range, as is usual when dealing with radiation, the absorption 
coefficient becomes 


This result is the same as (24) of §53, if we substitute for the E, there the energy 
hv of a single photon. Thus the elementary theory of §53, in which the radiation 
field is treated as an external perturbation, gives the correct value for the absorption 
coefficient. The average absorption for all directions of motion and of polarization 
of the incident beam is 


a ah (a’ exla")|?, 


which is just equal to the emission coefficient (27) divided by the factor 8rhv?/c*. 
This ratio for the absorption and emission coefficients may be verified by 
elementary statistical arguments. 

Let us now consider scattering. The true scattering coefficient is given by 
formula (38) of Chapter X. Such scattering of photons will not be accompanied by 
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any change of state of the atom on account of the factor 6,/q” in the expression for 
the matrix element (p'a’|V |p"a”) in (24). Thus the final energy W’ of the photon 
will equal its initial energy W° The scattering coefficient now reduces tot 


424 2 
e°/M°C + COS” Oyo. 


This is the same as that given by classical mechanics for the scattering of radiation 
by a free electron. We thus see that the true scattering of radiation by an electron 
in an atom is independent of the atom and is correctly given by the classical 
theory. This result, it should be remembered, holds only provided the wave-length 
of the radiation is large compared with the dimensions of the atom. 

The true scattering is a mathematical concept and cannot be separated out 
experimentally from the total scattering, given by formula (44) of Chapter X. 
Let us see what this total scattering is in the case of photons. A modification must 
now be made in the application of formula (44) of Chapter X. The summation )>, 
in this formula may be considered as representing the contribution to the scattering 
of double transitions consisting of transitions firstly from the initial state to 
state k and secondly from state k to the final state. The first transition may be 
an absorption of the incident photon and the second an emission of the required 
scattered photon, but it is also possible for the first transition to be the emission 
and the second the absorption. It is clear from the general nature of the method 
used for deriving formula (44) of Chapter X that both these kinds of double 
transitions must be included in the summation 5°, when this formula is applied 
to photons, although only the first of them was taken into account in the actual 
derivation given in Chapter X. 

For the double transition of absorption followed by emission we must take, 
using zero, single prime and double prime to refer to the initial, final 
and intermediate k state respectively, 


(k|V|p°a°) = (Oa"|V|pa®), — (pla’|V|k) = (p'a"|V 00"), 
E- Ei, = hv® + Hp(a’) = Hp(a") = h[v° = y(ar", a°)], 


where v’” is the frequency of the incident photon and 
hv(a", a°) = Hp(a”) — Hp(a®). 
Similarly for the double transition of emission followed by absorption we must take 


(k|V|p°a®) = (p'a"|V|00"), —— (p'a"|V|k) = (0a"|V |p°a"), 
E — E;, = hv’ + Hp(o®) — Hp(a") — hv — hv’ = — hf’ + v(a", o°)], 





1? replaces ‘.’ 
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where v’ is the frequency of the scattered photon, there being now two photons, 
of frequencies v° and v’, in existence for the intermediate state k. The expression 
for the scattering coefficient now reduces to 

4 


2 
h Na LAMY (Als 0 Ne Baie aye 0 
Sag |— 208 Bo15a0a 4 ite Léxlo")(a""|tola®) _ (a/|olar")(e"| ala r 


h2c4 v9 |m y® — v(a”, a®) v' + v(a", a) 








aq!” 





(28) 
where %) and 2x, have been written for v,o and x,, the components of x in 
the directions of the electric vectors of the incident and scattered photons, and 601 
has been written for 6,0,/, the angle between these electric vectors. If we write (28) 
in terms of x instead of x, we get 

















(27e)4 h 
ea Oe COS 991 0g 0a! 
2 
rom w oy f (a leilo")(a"|tola®) — (a'|zolo”)(a"|x1 0°) 
ps AO ASO) { v9 —v(a", a) a y(a”, a) 
(29) 
We can simplify (29) with the help of the quantum conditions. We have 
LL — LV{Xo = 0, 
which gives 
ile ‘ai|a")(a" azole”) — (a'|xola"")(a”|x1\a°) } = 0, (30) 
and also! 
L140 — LoL, = 1/M- (Lipo — por1) = th/mM - cos O01, 
which gives' 
Y= {(a'|z11a") - v(a"", a°)(a"|xola®) — v(a’, a")(a'|xo|a") - (a!"|aila°)} 
1 th h 
Be ria 9010 a%! = am £08 9010 a%q! (31) 


Multiplying (30) by 1’ and adding to (31), we obtain! 


S= {la ‘\ay|a")(a""|z9|a°) [v/ + (a, a°)| — (a' |x|") (a"|21|0°) [v’ + v(ar, a")|\ 


= h/2mm - cos O91 0a0a'- 
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If we substitute this expression for! h/27m - cos O14” in (29), we obtain, 
after a straightforward reduction making use of identical relations between the v’s, 


(27e)* 9 43 Sh eee ee io a | 


h2¢4 ae po — v(ar", a) py! + vy (a, a?) 








(32) 
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This gives the scattering coefficient in the form of the effective area that a photon 
has to hit per unit solid angle of scattering. It is known as the Kramers-Heisenberg 
dispersion formula, having been first obtained by these authors from analogies with 
the classical theory of dispersion. 

The fact that the various terms in (29) can be combined to give 
the result (32) justifies the assumption made in deriving formula (44) of 
Chapter X, that the matrix elements (p’a’|V|p’a”) of the interaction energy are of 
the second order of smallness compared with the (p’a|V|k) ones, at any rate when 
the scattered particles are photons. 


72. Einstein’s Laws of Radiation 


In the preceding section we determined the probability coefficients for absorption, 
emission and scattering of a photon by an atom. We were there concerned 
with only a single photon interacting with the atom (or at most with two), 
the interaction energy being given by (24). To complete our theory of radiation 
we require to know the laws governing the interaction of a number of photons 
with the atom. If the atom is exposed to an incident beam of radiation containing 
many photons, how do the absorption, emission and scattering probabilities depend 
on the intensity of this beam? 

This question cannot, of course, be answered simply from a consideration of 
the interaction energy, defined by (24), for a single photon. We have to rely* on 
the general interaction energy (16) for a number of photons, and this requires 
incidentally that we must perform the passage from sums to integrals once again. 
We make use of the general result (28) of §54, according to which a transition 
probability is proportional to the square of the modulus of the matrix element of 
the perturbing energy that refers to this transition. 

Let us consider an absorption process in which the number of photons in state 
a is reduced from n, to ng — 1, the atom simultaneously jumping from state a° to 
state a’. The probability of such a process will be proportional to the square of 
the modulus of the matrix element 


(nyng...Ng...a°|Hg|nyng...rq —1...0’) 





t*? replaces ‘.’ 
*‘rely’ replaces ‘fall back’ 
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of the total interaction energy Hg. The only term in the expression (16) for Hg 
which can contribute to this matrix element is V,n2e"’s. This matrix element 
is thus proportional to n? and the transition probability is proportional to nq. 
The passage from sums to integrals is now quite trivial, the final result being 
that the probability of an absorption process is proportional to the intensity of 
the incident radiation. 

Similarly for an emission process, in which the number of photons in state a is 
increased from ng to ng + 1, we must consider the matrix element 


(ning...Nq...a°|Hg|ning...ma + 1. 4c"); 


The only term in expression (16) that contributes to this is Va(na + 1)2e~™ 
This matrix element is thus proportional to (ng +1)? and the transition probability 
to nq +1. In the same way a scattering process, in which the number of photons 
in state a is decreased from n, to n, — 1 and that in state b is increased from ny to 
m +1, is due to the term V,yn3e"’*(n, + 1)2e-™, if it is a true scattering process, 
and to the product of the two terms V,n2ei’* and V;(ny + 1)2e~, if otherwise. 
The scattering probability is thus in any case proportional to na(n, + 1). 
To interpret these results we must now make an accurate passage from the discrete 
to the continuous ranges of stationary states for the photons. 

Suppose we have a distribution n, of photons over the discrete states a. 
To obtain the density of these photons (in ordinary space) we may 
suppose them to be represented by a Schrédinger function (p|) = n3, 
and transform this Schrédinger function to the (x, y, z)-representation by means 
of the transformation function (z|p). This transformation function must now 
have the value ; , 

(c|p) _ h 7 eltp ig) 2 


differing from the value given by (36) of Chapter VI by the factor 77%, on account of 
the weight function of our present p-representation differing from that of the usual 
one by the factor o,, as was discussed in obtaining equation (17). Thus 


3 ag af 3 
(xl) = S7(elp)(p®) = hr Png 04 


Suppose n, has the value unity for one state p and zero for all the others. We shall 
then have 
3 1 
(2|) = bP efPing 8 
and the density of photons will be 


I(x|)P = ho. 
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For an arbitrary distribution n, of the photons over the discrete states a, 
the photon density will be given by addition of the contributions from each state 


and will therefore be 
hae ye, = A f ng dpa: 


Thus the number of photons per unit volume per unit of momentum space is h~?nq, 
corresponding to an energy h~?v,nq per unit volume per unit of momentum space. 
The intensity per unit frequency range, equal to c times the energy density per 
unit solid angle per unit frequency range, is therefore* 


TS hi) e ig: 


The probability for an emission process, which we found was proportional to 
nq +1, is thus proportional to I, + hv3/c?. This means that with no incident 
radiation there is still a certain amount of emission (which is, in fact, given by 
expression (26)), but that the emission is increased or stimulated by incident 
radiation in the same direction and having the same frequency (and state of 
polarization) as the emitted radiation under consideration. Our present theory 
of radiation thus completes the imperfect one of §53, and gives a ratio for 
the stimulated and spontaneous emissions in agreement with Einstein’s laws of 
radiation discussed at the end of §53. 

The probability for a scattering process from state a to state b, which we found 
was proportional to nq(np +1), is in the same way proportional to I,(Ip + hv?/c?). 
Thus the scattering of radiation is also stimulated by incident radiation in 
the same direction and having the same frequency as the scattered radiation. 
The stimulation phenomenon is, in fact, a general one, as has been shown by 
Albert Einstein and Paul Ehrenfest? from general statistical arguments. 





** replaces *.’ 

tRinstein, A., Ehrenfest, P. Zur Quantentheorie des Strahlungsgleichgewichts. Z. Physik 19, 
301-306 (1923). https://doi.org/10.1007/BF01327565 See also Pauli, W. Uber das thermische 
Gleichgewicht zwischen Strahlung und freien Elektronen. 7%. Physik 18, 272-286 (1923). 
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XI. RELATIVITY THEORY OF 
THE ELECTRON 


73. Relativity Treatment of a Single Particle 


OuR theory of special dynamical systems from Chapter VI onwards was essentially 
non-relativistic. We worked all the time with one particular Lorentz frame of 
reference and did not make it an essential requirement of the theory that its results 
should be independent of this frame. Let us now inquire into what sort of 
modifications we may expect relativity to introduce. 

It is fairly certain that the general theory of states and observables developed 
in Chapters II-V will apply also to relativity treatments of dynamical systems. 
We are faced with the problem, however, of deciding with what observables we shall 
now work. There are serious disadvantages in taking these observables to be 
the values, € say, of dynamical variables € at the time t. If the €/s occur in 
our analysis, they would have to appear on the same footing as the €,s, the values 
of the €’s at the time 7 in some other Lorentz frame. We should therefore require 
to know the relations between the €/s and the €,’s, and these would in general be 
very complicated and artificial, as they would require us to connect distant parts of 
space-time. In any case the €/s are not quantities that could easily be observed and 
we should not expect them to play any fundamental role in the theory. A possible 
way out of the difficulty would be to build up a purely field theory and to take 
as observables the values of the field quantities at definite points in space-time. 
This appears to be the most straightforward way of dealing with general dynamical 
systems on relativity lines, but it involves complicated mathematics and appears 
to be too difficult for practical application.‘ 

The difficulty of a relativity treatment becomes much less severe when 
one confines one’s attention to the problem of a single particle moving in a given 
field of force. If we now take a representation in which the observables x;, y:, 2% 





8See Heisenberg, W., Pauli, W. Zur Quantendynamik der Wellenfelder. 7. Physik 56, 1-61 
(1929). https://doi.org/10.1007/BF01340129; Heisenberg, W., Pauli, W. Zur Quantentheorie 
der Wellenfelder. II. Z. Physik 59, 168-190 (1930). https:/ /doi.org/10.1007/BF01341423. 
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specifying the position of the particle at time t are diagonal, we have as the wave 
function representing a state a function (x;y; %|) of the three variables x, y%, 2 
depending on the parameter t, which is the same as a function (xyzt|) of the four 
variables x, y, z, t. The domain of our wave function thus becomes identical 
with the ordinary space-time continuum, and this circumstance makes possible 
an elementary treatment of the problem and allows us to use considerations 
which cannot be extended to more general dynamical systems. We may expect, 
for instance, the physical conditions at any point in space-time to depend only on 
the value of the wave function at that point and neighbouring points, and thus 
the wave function, if not actually invariant under a Lorentz transformation, 
should transform according to simple laws. 

Let us now see how we can bring the momentum of the particle into the theory. 
The value of a component of momentum at a specified time is an observable 
of a rather artificial kind, even in the case of a system with a single particle, 
and we should not expect it to play an important role. This observable, 
we saw in §36, is connected with a certain space-displacement operator, which, 
when it operates on any wave function, produces at the specified time, just a spatial 
displacement, the value of the new wave function at any other time being then 
determined by the wave equation. It would seem more natural in a relativity 
theory to deal with an operator which produces at all times simply a spatial 
displacement of the wave function, such an operator being essentially a simple 
partial differentiation of the type 0/Ox of the wave function (xyzt|) in four 
variables. The result of such an operator operating on a wave function is a new 
wave function which in general does not satisfy the wave equation and hence 
does not represent a state of the system, so that this operator is not an observable. 
All the same we may expect the operator —ihO/Oz to play the part of a momentum 
in the theory, in spite of the fact that since it refers to momentum in general and not 
momentum at a particular time, we can give no precise meaning to an observation 
of it. 

Thus we are led to introduce the operators 


0 0 0 
xr ot ) Aas SS ) eer Zz SS = ) ar 1 
p ins Py ul p its (1) 
and also the corresponding 
0 
W = 7h— 2 


referring to time displacement, to play the part of momentum and energy. 
They can operate on any wave function, but since the result of such operation 
does not satisfy the wave equation and does not represent a state, they are not 
observables. All the same they may be used in algebraic analysis like observables 
and will satisfy all the axioms of ordinary algebra except the commutative 
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law of multiplication. The complete algebraic scheme of Chapter II will not, 
however, apply, since we cannot interpret day as a number when a is an operator of 
this more general kind. It will be more convenient in the present chapter to regard 
the symbols w, p, &c., not in the abstract sense of Chapter I, but as wave functions 
and linear operators in the x, y, z & t representation. 


74. The Wave Equation for the Electron 


Let us consider first the case of the motion of an electron in the absence of 
an electromagnetic field, so that the problem is simply that of the free particle, 
which was discussed in §39. The Hamiltonian for this system provided by classical 
mechanics is given by equation (1) of §39, and this leads to the wave equation (5) 
of that section. This wave equation may be written 


{W/e- (m?c? + p2 + p3 + p2)t} yb =0, (3) 


where W and the p’s are to be interpreted as operators in accordance with 
equations (1) and (2). Equation (3), although it takes into account correctly 
the variation of the mass of the particle with its velocity, is yet unsatisfactory 
from the point of view of relativity, because it is very unsymmetrical between W 
and the p’s, so much so that one cannot generalize it in a relativistic way to the case 
when there is a field present. We must therefore look for a new wave equation for 
the free particle. 

If we multiply the wave equation (3) on the left by the operator 
{W/c + (mc? + p2 + p; + p2)?}, we obtain the equation 


{W?/c? — mc? — pi — py, — pz} p =0, (4) 


which is of a relativistically invariant form and may therefore more conveniently 
be taken as the basis of a relativity theory. Equation (4) is not completely 
equivalent to equation (3) since, although every solution of (3) is also a solution 
of (4), the converse is not true. Only those solutions of (4) belonging to positive 
values for W are also solutions of (3). 

The wave equation (4) is not in agreement with the general laws of the quantum 
theory on account of its being quadratic in W. In 837 we deduced from quite 
general arguments that the wave equation must be linear in the operator 0/Ot 
or W, like equation (43) of that section. We therefore seek a wave equation that is 
linear in W and that is roughly equivalent to (4). In order that this wave equation 
shall transform in a simple way under a Lorentz transformation, we try to arrange 
that it shall be rational and linear in p,, py and p, as well as in W, and thus of 
the form 





{W/c+ QrDyz + AyPy + a2p2 + B}y =0, (5) 
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where the a’s and £ are independent of W and the p’s. Since we are considering 
the case of no field, all points in space-time must be equivalent, so that the operator 
in the wave equation must not involve x, y, z or t. Thus the a’s and (6 must also 
be independent of x, y, z and t. They must therefore denote some quite new 
dynamical variables, which may be pictured as describing some internal motion in 
the electron. We shall see later that they just describe the spin of the electron. 
The a’s and (£ must, of course, commute with W and the p’s and also with a, y, 
z and t. 

Multiplying (5) by the operator {W/c — apy — AyPy — A2Pr — 3} on the left, 
we obtain 


2 — S5 [akp? + (Qedy + Oye) PePy + (G28 + Barr) pe] — A| p= 0. 
L,Y, 2 
This is the same as (4) if the a’s and @ satisfy the relations 
a=1, apy + aya, = 0, 
Ce = gene A,B TT PQ = 0, 


together with the relations obtained from these by permuting x, y and z. If we write 





8 = Amme, 
these relations may be summed up in the single one, 
yy + Oy, = Ww, (u,Y=2,y,z,0r m). (6) 


The four a’s all anticommute with one another and the square of each is unity. 

Thus by giving suitable properties to the a’s and { we can make the wave 
equation (5) equivalent to (4), in so far as the motion of the electron as a whole 
is concerned. We may now assume (5) is the correct relativity wave equation 
for the motion of an electron in the absence of a field. This gives rise to one 
difficulty, however, owing to the fact that (5), like (4), is not exactly equivalent 
to (3), but allows solutions corresponding to negative as well as positive values 
of W. The former do not, of course, correspond to any actually observable 
motion of an electron. For the present we shall simply evade the difficulty by 
ignoring the negative-energy solutions. Their proper physical interpretation will 
be discussed in 879. 

We can easily obtain a representation of the four a’s. They have similar 
algebraic properties to the o’s introduced in §43 to describe the spin of an electron, 
which o’s can be represented by matrices with two rows and columns. So long as 
we keep to matrices with two rows and columns we cannot get a representation 
of more than three anticommuting quantities, and we have to go to four rows and 
columns to get a representation of the four anticommuting a’s. It is convenient 
first to express the a’s in terms of the o’s and also of a second similar set 
of three anticommuting observables whose squares are unity, (1, (2, (3 Say, 
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that are independent of and commute with the o’s. We may take, amongst other 
possibilities, 

ke = PiOn Oy HP10y 3 Ae = Pisa, Op = 3s (7) 
and the a’s will then satisfy all the relations (6), as may easily be verified. If we 
now take a representation with p3 and a, diagonal, we shall get the following 
scheme of matrices: 


0: 0.20 0 S30: 0 Le 0 363, 6 
fli OOO), Ae OOO 2. oO 40 
19 “Or Oe 1 PP Oe 20: Os He Ok [Oe ak ep 
0010 i .0, 46 OF WO. 124 
0 Oo 16 Oe 0 10 0 0 
0001 Oo 0 0 9 01 0 0 
FEES Sis gt ge ag. Age eae, cage iP Pe Ng a La 
010 0 0% 0 0 ON 0) “Oe 4 


Corresponding to the four rows and columns, the wave function must have four 
components. We saw in §43 that the spin of the electron requires the wave function 
to have two components. The fact that our present theory gives four is due to our 
wave equation (5) having twice as many solutions as it ought to have, half of them 
corresponding to states of negative energy. 

With the help of (7), the wave equation (5) may be written in the vector form 


{W/c+ pila, p) + psmc}w = 0. 


To generalize this equation to the case when there is an electromagnetic 
field present, we follow the classical rule of replacing W and p by W + Ao andt 
pt+e/c:-A, Ap and A being the scalar and vector potentials of the field at the place 
where the electron is. This gives us the equation 


W 
{ated te (o.p+<A) + pane} v=o, (8) 
Cc Cc Cc 


which is the fundamental wave equation of the relativity theory of the electron. 
The conjugate imaginary equation 


W 
bf + Sao+m (o,p+£A) + pame} =o (9) 
G c c 


must be treated on the same footing as (8). The operators W and p in (9), 
which operate to the left, must be interpreted, according to §836 and 37, as having 
the meanings in equations (1) and (2) with the signs reversed. 





1? replaces ‘.’ 


236 XIII. RELATIVITY THEORY OF THE ELECTRON 


75. Invariance under a Lorentz Transformation 


Before proceeding to discuss the physical consequences of the wave equation (8) 
or (9), we shall first verify that our theory really is invariant under a Lorentz 
transformation, or, stated more accurately, that the physical results the theory 
leads to are independent of the Lorentz frame of reference used. This is not by any 
means obvious from the form of the wave equation (8). We have to verify that, 
if we write down the wave equation in a different Lorentz frame, the solutions 
of the new wave equation may be put into one-one correspondence with those 
of the original one in such a way that corresponding solutions may be assumed 
to represent the same state. For either Lorentz frame, the square of the modulus 
of the wave function, summed for the four components, gives the probability 
per unit volume of the electron being at any given place in that Lorentz frame. 
This probability is of the nature of an electric density (and will be called the electric 
density in future, for brevity), and its values, calculated in different Lorentz 
frames for wave functions representing the same state, should be connected like 
the time components in these frames of some 4-vector. Further, the 4-dimensional 
divergence of this 4-vector should vanish, signifying conservation of charge, or that 
the electron cannot appear or disappear in any volume without passing through 
the boundary. 

For discussing Lorentz transformations it is convenient to make a slight change 
in our notation. We shall use the suffixes 1, 2, 3 instead of x, y, z and shall put 
po for W/c, and we shall also use the convention that terms containing a repeated 
suffix are to be summed over the values 0...3 for that suffix. We can now write 
equation (8) in the form* 


{a,(Du + e/c- Ay) +Ammc}w = 0, (10) 
ag being equal to unity, and similarly we can write equation (9) in the form* 
b{ ay (py, + e/c- Ay) + Ammce} = 0. (11) 


We now apply a Lorentz transformation and denote quantities referring to 
the new frame by a star. The components of the 4-vectors p and A will transform 
according to a linear law of the type? 


Pu = Aw, An = AA; (12) 

Substituting these expressions for p,, and A,, in equations (10) and (11), we obtain* 
{QpApv (py + e/c- Ay) + Ommc}y = 0 (13) 

and P{Ayau (py + e/c- A) +ammc} = 0 (14) 


We now try to bring these equations back to the form of the original (10) and (11) 
by introducing a new wave function ~*, whose four components are linear functions 





OD 


replaces ‘.’ 
Original has a% 
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(with constant numerical coefficients) of the four components of the original w. 
This means that ~* is connected with w by an equation of the type 

y=, (15) 
where ¥y is an operator, like the a’s, which can be represented as a matrix with 
four rows and columns. The conjugate imaginary equation to (15) is 


p° = 7. 
Equations (13) and (14) will go over into the equations* 
Tav(py +e/c- At) + amme}y* =0 (16) 
and o{ar(py + e/e- Ay) + Ammce}yy = 0 (17) 


provided we can choose y such that’ 

YaApawy = Guy) Yam = Am: (18) 
These equations (16) and (17) are of the same form as (10) and (11), as required, 
since one can divide out by the extra factors 7 and y. 

In order to verify that we can always choose y to satisfy equations (18), 
let us first take the special case when the change of our frame of reference 
consists simply of a rotation through a hyperbolic angle @ in the xt plane, so that 
the transformation equations for the components of a 4-vector are of the type 


Po = po cosh 6 + pj sinh 6, 
Pi = po sinh 6 + pj coshé, (19) 
P2 = Po; P3 = P3- 

The values of the a,, may be written down at once from a comparison of 


these equations with (12). With these values for the a,,, it is easy to see that 
equations (18) hold when we take 


We have, in fact, Yaoy =7y =e™ 
=1+ 0a; + 00? /2!+ Paz/3!+---. 
On account of a? = 1, this reduces to 








Faoy = {1+ 07/2! +---}+ar{o+ 67/3! +--+} 
= coshé + a; sinhé 
= ag cosh é + ay sinh @. 
Again, Fayy = ar7Vy = ao sinh é + a, coshé. 
Further, FO = eb net = e281 een a2 = 2, 





SOriginal has the suffix yz displaced incorrectly 
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since Q2 anticommutes with a,, which results in a2f(a,;) = f(—a,)a2 for any 
function f(a,) of a,. Similarly 

Ya3y = 3, Jam = Am- 
Thus the five equations (18) hold with y given by (20) when the a,, are given 
by (19). 

As a second typical change of the frame of reference, we may consider a rotation 
through an angle @ in ordinary space about the z-axis. The transformation 
equations are now ‘ ; 
Po = Po» Pi =P 
p2 = p> cos + p3 sin 8, 
p3 = —p sin @ + p3 cos 0. 

With the new values for the a,,, we can easily verify that equations (18) hold with 
Baza 7 =, e7200302 = e20a203 
the analysis being very similar to the preceding case. 

If two changes of the frame of reference are made consecutively, we simply have 
to multiply the corresponding y’s to get the y for the resultant change. Now any 
change of the frame of reference may be built up from two rotations of the types 
we have considered, and hence there will always be a ¥ satisfying (18). 

In this way we see that the solutions of the wave equation in the new frame of 
reference, equation (16), can be put into a natural one-one correspondence with 
those of the original wave equation (10), corresponding solutions being connected 
by (15), and we may assume that corresponding solutions represent the same state. 
It remains for us to verify that the electric density transforms like the time 
component of a 4-vector and that the divergence of this 4-vector vanishes. 

We shall introduce the notation ¢,.~, to denote the sum of the product of 
each of the four components of ¢, with the corresponding component of ws. 
In the same way ¢&€.nwW, where € and 7 are any linear operators that can operate 
on the wave functions, will denote the sum of the product of each component 
of ¢€ with the corresponding component of nw. Our new symbols of the type 
o&.nw are functions of x, y, z and t, and are quite distinct from the products 
e&nw of Chapter II, which products, we have seen, have in general no meaning for 
the more general type of linear operator with which we are now dealing. It should 
be noted that 


2 
VS 


.a = bar) (21) 
when qa is one of the a’s in the wave equation, or more generally when it is any 
operator which means simply taking four linear functions (whose coefficients are 
numbers or functions of x, y, z and t) of the four components of the wave function. 

We can now express the electric density as ¢.w, which is the same as @.agy or 
gag. since ag = 1. Let us see how the four quantities ¢.a,~, with pp =0,..., 3, 
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transform under a Lorentz transformation. We have, from (15) and (18), 


pr .ayy™ = PY-AvYY = p.Vavyy 
= $.Ay Aw = (¢.apt))ay- 


Comparing this result with (12), we see that the four quantities ¢.a,,7) transform 
like the covariant components of a 4-vector. The contravariant components will be 


Q.¥, —$.0,y, —.a2y, —.03. 


This verifies that our electric density ¢.y is the time component of a 4-vector 
and that the corresponding space components are —¢.a,W (with r = 1,2,3). 
These space components give the electric current, or, more accurately, 
the probability of the electron crossing unit area per unit time. 

The divergence of our 4-vector is 


25 Sore ae ant), (22) 


where zo denotes ct and the + sign means that the + sign is to be taken for 
uu = 0 and the —sign for 4 = 1, 2,3 before one does the summation. To prove this 
divergence vanishes, multiply equation (10) by ¢ and (11) by y, taking the sum 
over the four components in each case, and subtract. The result is 


P-ApPpP — PQpPy-W = 0, 
the other terms cancelling on account of (21). With the help of (1) and (2) 


this gives 
3 dy | Oo _ 
“ + | O005n + on OX uf = - 


which just expresses the vanishing of (22). In this way we complete the proof that 
our theory gives consistent results in whichever frame of reference it is applied. 


76. Existence of the Spin 


In §74 we saw that the correct wave equation for the electron in the absence 
of an electromagnetic field, namely equation (5), is equivalent to the wave 
equation (4) which is suggested from analogy with the classical theory. 
This equivalence no longer holds when there is a field. By treating the correct 
wave equation for this case, namely (8), in the same way as we treated (5) and 
comparing it with the wave equation to be expected from analogy with the classical 


theory, namely 
We BN xy. ee 
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in which the operator is just the classical relativity Hamiltonian, we may expect 
to get an indication of the new physical features of the present theory. 

We must multiply (8) by some factor on the left to make it resemble (23) as 
closely as possible. Taking this factor to be 


W e e€ 
<= Ag = pi (<. p+ <A) — p3mc, 
Cc Cc Cc 


we get 








C C 
(£5) (omsta)- (anes) (sta) Joos 
Gs ce é G cc < 
(24) 
We now use the general formula, that, if B and C are any two vectors that commute 


ith 
mg, Ba, C) = > {02 Br Cy + 6x0yBaCy + Fyo2ByCr} 


xyz 
= (B, C)+i>/0,(BrC, — ByCz) 
ryz 
= (B, C) +i(o, Bx C) (25) 
Taking* B= C=p-+e/c- A, we find, since 
e e e 
(p+<A) x (p+ <A) =<{pxA+Axp} 
= —ihe/c-curlA = —the/c- #, 
where # is the magnetic field, that 
e,\ e,\ he 
(<. p+ “A) = (p AP -A) So acd KH). 





Also we have 


(£4) (n9r ts) (ones) (Eo) 
Cc Cc Cc Cc Cc Cc 


as (« ee + Aap ~ Po] 
Cc Cc Cc 








ihe 10A 
= (« OE grad Ao) = “I (9, é) 


G 
where @ is the electric field. Thus (24) becomes 


2 2 
1(" is “Ao : (p+ “a sen rian | e220 
Cc Cc Cc Cc Cc 
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This equation differs from (23) through having two extra terms in the operator. 
The electron according to the present theory is more closely analogous to a classical 
system with the Hamiltonian function 


2 2 
(= i: “Ao : (p + A) — mee? — "“(e, 3) — ip (a, 8). 
aes C Cc 


If we neglect relativity corrections, so that we can put W = mc? + W, and count 
W, as small, this Hamiltonian reduces, after division throughout by 2m, to 


€ he 


Ws {-eA | ~ (p+ A) eee nee a}. 





C 2mc 2mc 


We can now see that the two extra terms may he considered approximately as due 
to the electron possessing an additional potential energy of amount 


he _ he 

Oe aaa 2mc 
which may he interpreted as arising from the electron having a magnetic moment* 
—he/2mc-o and an electric moment —ip;he/2mc-o. This magnetic moment is 
in agreement with the assumptions of §43 and is what is required by experiment. 
The electric moment, on the other hand, is an imaginary? quantity and thus cannot 
be considered as having a physical meaning. The Hamiltonian of our original wave 
equation (8) is real, and the imaginary term has appeared only on account of our 
having performed a rather artificial operation to get a Hamiltonian that can be 
compared with the classical one. 

The spin angular momentum does not give rise to any potential energy and 
therefore does not appear in the result of the preceding calculation. The simplest 
way of showing the existence of the spin angular momentum is to take the case 
of the motion of an electron in a central field of force and determine the angular 
momentum integrals. We therefore take A = 0 and Ao a function of r only, so that 
the wave equation (8) becomes 


(o, &), 


(W — H)p =0, 
where H = —eAo(r) — cpi(o, p) — pgme”. (26) 
This H is the Hamiltonian to be used in the equations of motion. 
If we take the z-component of orbital angular momentum, mz = yp. — ZDy, 


we find for its rate of change, with the help of commutability relations proved in 
§§44 and 45, 


ihm, =m,H — Hm, 
a —cpi{m,(a, Pp) = (o, p)m, } 
= —cpi(o,MzP — pmz) 


== 1hepi{oypz— OD, 





pure’ is redundant 
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Thus m, 4 0 and the orbital angular moinentum is not a constant of the motion. 
We have further 
tho, =0,H — Ho, 
= —cpi{o.(7, Pp) — (,P)or} 
= —cpi (0x0 — 00, P) 


= —2icpi{ozPy — Fypz} 
with the help of equations (42) of §43. Hence 
ih(tn, + 4ho,) = 0, 


so that the vector m+¢4ho is a constant of the motion. This result one can interpret 
by saying the electron has a spin angular momentum sho, which must be added 
to the orbital angular momentum m before one gets a constant of the motion. 


77. Transition to Polar Variables 


For the further study of the motion of an electron in a central field of force, it is 
convenient to make a transformation to polar coordinates, as was done in §45 in 
the non-relativity case. We can introduce r and p, as before, but instead of k, 
the magnitude of the orbital angular momentum m, which is no longer a constant 
of the motion, we must now use the magnitude of the total angular momentum 
M =m + 4ho. If 7 is this magnitude expressed in units of h, we shall have 


f°? = M2 + Mj + M? +49. (27) 


The eigenvalues of m, are integral multiples of h, those of 4ho, are +4h, and hence 
those of M, must be half-odd integral multiples of h. It follows from the general 
result of §30 that the eigenvalues of 7 must be integers greater than zero. 

If in formula (25) we take B= C =m, we get 





(o,m)? = m? + i(o,m x m) 
m’* — h(o,m) 
=(m + sho)? — 2h(o,m) — 3h’. 
Hence {(o,m) + h}? = M? + Sh 


Thus (0, m)+/ is a quantity whose square is M? + 4h? and we could, consistently 
with equation (27), define jh as (o, m) +A instead of as the positive square root of 
M?+4h?. This would not be convenient, however, since we want j to be a constant 
of the motion and (0, m) +f is not constant. We have, in fact, from applications 
of (25), 
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and (o,p)(o,m) =i(o,p x m), 
so that* 


(o, m)(o, P) a (o, P)(o, m) = is” Or{MyDz — MzPy + PyMz — DzMy} 


= id oe - 2ihp, = —2h(o,p), 
or {(o,m) + h}(o,p) + (a, p){(o,m) + h} = 0. 


Thus (o,m) + A anticommutes with one of the terms in the expression (26) 
for H, namely the term —cp;(o,p), and commutes with the other two. It follows 
that p3{(o,m) + h} commutes with all the three terms in H and is a constant 
of the motion. But the square of p3{(o,m) + h} is also M? + 4h?. We can 
therefore take 

jh = p3{(o,m) + h}, (28) 


which gives us a convenient rational definition for j which is consistent with (27) 
and makes 7 a constant of the motion. The eigenvalues of this 7 are all positive 
and negative integers, excluding zero. 

By a further application of (25), we get 


(o,x)(o,p) = (x, p) + #(o,m) 
= TPr + ipsjh, (29) 
with the help of (28) and also of equation (13) of Chapter VIII. We introduce 


the observable ¢ defined by Pea loo): (30) 


Since r commutes with p,, and with (o, m), it must commute with «. We thus have 


re = pilook) = (esx) = at 


or ae 
Since there is symmetry between x and p so far as angular momentum is concerned, 


pi(o,x), like p,(o,p), must commute with M and j. Hence € commutes with M 
and 7. Further, « must commute with p,, since we have 


(o,m)(x, p) — (x, p)(o,x) = (0, x(x, p) — (x, P)x) = th(o,x). 





which gives re(rp, + th) — (rp, + ih)re = thre 
or re(py + 2ih) — (rp, + ih)re = thre, 
which reduces to EDr — Pr€ = 0. 





** replaces ‘.’ 
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From (29) and (30) we obtain 
repi(o, P) = TPpr 7 ap3gh 

or pila, Pp) = ep, + iepsjh/r. 

Thus H/c = —e/c- Ao — ep, — iejh/r — p3me. 

This gives our Hamiltonian expressed in terms of polar variables. It should be 
noticed that ¢ and pz; commute with all the other variables occurring in H and 
anticommute with one another. This means that we can take a representation in 
which € and p3 are represented respectively by the matrices 


ca oo ay 


and in which r, say, is diagonal, and the wave function (r|) will then have 
two components, (r|), and (r|),, say, referring to the two rows and columns of 
the matrices. 





78. The Fine-Structure of the Energy-Levels of 
Hydrogen 


We shall now take the case of the hydrogen atom, for which Ap = e/r, and work out 
its energy-levels, given by the eigenvalues H’ of H. The equation (H’ — H)w = 0 
which defines these eigenvalues, when written in terms of representatives in 
the representation discussed above with € and p3 represented by the matrices (31), 
gives the equations 


(+5) (re-r Zee - Aerts + metre =o 


Cc cr 





(+S) de=nZtrie- Pera tme(ris=0 


Cc cr 


If we put 
= a2, (32) 


(= +2) oda- ($44) (=o o 


(-2+2) n+ (P-4) =o, 


where a = e?/hc, which is a small number. We shall solve these equations by 
a similar method to that used for equation (20) in §46. 
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Put —r/a —r/a 

CieSer rt. Gp Se" a 

introducing two new functions, f and g, of r, where 

= h(m?c? — H"? /c?)~4 (34) 


Equations (33) become 





| rans (35) 








We now try for a solution in which f and g are in the form of power series, 
—— s.: Gr, g= > er (36) 


in which consecutive values of s differ by unity though these values need not 
be integers. Substituting these expressions for f and g in (35) and picking out 
coefficients of r*~', we obtain 





(37) 


C,-1/@1 tac, —(s + j)c, 7G C,-1/4@ =0, 
—c,_,/dg +ac,+(s—J)c, —c._1/a = 0, 


By multiplying the first of these equations by a and the second by a2 and adding, 
we can eliminate both c,_; and c,_,, since from (34) a/a, = a2/a. This gives 


Cs[aa + ao(s — j)| + c,[aza — a(s + j)] = 0, (38) 


a relation which shows the connexion between the primed and unprimed c’s. 
The boundary condition at r = 0 requires that the series (36) shall terminate 
on the side of small s. If so is the minimum value of s for which c, and c, do not 


both vanish, we obtain from (37), by putting s = so and c.,-1 = c,,_1 = 9, 
as, (so + j)c; Cg — 0, 
gor (So ~ Dew = = 0, 
which give . =-s,4+ J". 


Since the boundary condition requires that the minimum value of s shall be greater 
than zero, we must take 
=+4/7? —a?. 
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To investigate the convergence of the series (36) we shall determine the ratio 
Cs/Cs-1 for large s. Equation (38) and the second of equations (37) give 
approximately, when s is large, 


CsQ2 = Cha 
and SCs = Cs1/a4C,_1/az. 
Hence C5) 652 108: 


The series (36) will therefore converge like 
Se) 
sl \a 


or e?"/*, This result is similar to that obtained in §46 and allows us to infer, 


as before, that all values of H’ are permissible for which a is* imaginary, 
i.e. for which, from (34), H’ > mc?, but of those values of H’ for which a is real, 
only those are permissible for which the series (36) terminate on the side of large s. 

If the series (36) terminate with the terms c, and c,, so that cs41 = ch,, = 0, 
we obtain from (37) with s+ 1 substituted for s 


c,/a, + ¢,/a = 0, 

—c,/az —cs/a = 0. 
These two equations are equivalent on account of (34). When combined with (38), 
they give 


ay[aa + ao(s — j)| = alaza — a(s + 9), 
which reduces to 2aa25 = a(az — ay)a 
or are aan Pere A 
a fy. - «iS ch” 


with the help of (32). Squaring and using (34), we obtain 


s*(m?e2 — H” /) = a? H” /e. 


2 
Hence HH! 7 14% A 
mc? 82 


The s here, which specifies the last term in the series, must be greater than sg by 
some integer not less than zero. Calling this integer n, we have 





s=n+yvVj?-a? 
i 


and thus H!’ ; a? ie 
me (n+ /72 — a2)? ‘ 


This formula gives the discrete energy-levels of the hydrogen spectrum and 
was first obtained by Sommerfeld working with Bohr’s orbit theory. There are two 








*‘pure’ is redundant 
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quantum numbers n and j involved, but owing to a? being very small the energy 
depends almost entirely on n+ |j|. Values of n and |j| that give the same 
n+ |j| give rise to a set of energy-levels lying very closely to one another, and to 
the energy-level given by the non-relativistic formula (27) of §46 with s = n+ |j]. 
For a general value of n, 7 can have any integral value except zero. The value 
n = Ois, however, exceptional as it makes equation (38) vanish identically. A closer 
investigation shows that in this case only negative values for 7 are allowed.' 


79. Physical Meaning of the Negative-Energy 
Solutions 


It has been mentioned before that the wave equation for the electron admits of 
twice as many solutions as it ought to, half of them referring to states with negative 
values for the kinetic energy W + eAg. This difficulty was introduced as soon as 
we passed from equation (3) to equation (4) and is inherent in any relativity theory. 
It occurs also in classical relativity theory, but is not then serious since, owing to 
the continuity in the variation of all classical dynamical variables, if the kinetic 
energy W + eAp, is initially positive (when it must be greater than or equal to 
mc’), it cannot subsequently be negative (when it would have to be less than or 
equal to —mc?). In the quantum theory, however, discontinuous transitions may 
take place, so that if the electron is initially in a state of positive kinetic energy 
it may make a transition to a state of negative kinetic energy. It is therefore no 
longer permissible simply to ignore the negative-energy states, as one can do in 
the classical theory. 
Let us examine the negative-energy solutions of the equation 


(F # <Ac) + Oz (Pe + =Az) + ay (Py + =Ay) + a: (2 + Az) + ame} y= 0 
CoC c Cc G 

(39) 
a little more closely. For this purpose it is convenient to use a representation of 
the a’s in which all the elements of the matrices representing a,, a, and a, are real 
and all those of the matrix representing a,, are’ imaginary. Such a representation 
may be obtained, for instance, from that of §74 by interchanging the expressions for 
ay and a, in (7). With such a representation, if we write —7 for 7 in the operator 





*Gordon, W. Die Energieniveaus des Wasserstoffatoms nach der Diracschen Quantentheorie 
des Elektrons. 7. Physik 48, 11-14 (1928). https://doi.org/10.1007 /BF01351570 
§‘nure’ is redundant 
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of equation (39), we get, remembering (1) and (2),! 


{(Ea — “| r Oy (<A, — Pr) + Qy (=A, _ Py) + A, (FA, —P,) _ ame} = 0 
C C C C C 

(40) 
Thus the conjugate complex of any wave function that is a solution of (39) is 
a solution of (40). Further, if the solution of (39) belongs to a negative value for 
W + Ao, the conjugate complex solution of (40) will belong to a positive value 
for W — eAp. But equation (40) is just what one would get if one substituted 
—e for e in (39). It follows that the conjugate complex of any solution of (39) 
belonging to a negative value for W + eA is a solution, belonging to a positive 
value for W — eAo, of the wave equation obtained from (39) by substitution of 
—e for e, and therefore represents an electron of charge +e, instead of —e, moving 
through the given electromagnetic field. Thus the unwanted solutions of (39) are 
connected with the motion of an electron with a charge +e. (It is not possible, 
of course, with an arbitrary electromagnetic field, to separate the solutions of (39) 
definitely into those referring to positive and those referring to negative values 
for W + eAo, as such a separation would imply that transitions from one kind to 
the other do not occur. The preceding discussion is therefore only a rough one, 
applying to the case when such a separation is approximately possible.) 

In this way we are led to infer that the negative-energy solutions of (39) refer 
to the motion of protons or hydrogen nuclei, although there remains the difficulty 
of the great difference in the masses. We cannot, however, simply assert that 
the negative-energy solutions represent protons, as this would make the dynamical 
relations all wrong. For instance, it is certainly not true that a proton has 
a negative kinetic energy. We must therefore establish the protons on a somewhat 
different footing. We assume that nearly all the negative-energy states are 
occupied, with one electron in each state in accordance with the exclusion principle 
of Wolfgang Pauli. An unoccupied negative-energy state will now appear as 
something with a positive energy, since to make it disappear, z.e. to fill it up, 
we should have to add to it an electron with negative energy. We assume that 
these unoccupied negative-energy states are the protons. 

These assumptions require there to be a distribution of electrons of infinite 
density everywhere in the world. A perfect vacuum is a region where all the states 
of positive energy are unoccupied and all those of negative energy are occupied. 
In a perfect vacuum Maxwell’s equation 

div@ = 0 
must, of course, be valid. This means that the infinite distribution of 
negative-energy electrons does not contribute to the electric field. Only departures 








{The original’s bracketed pairs of terms are swapped in order but the commutative law of 
addition always applies. 
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from the distribution in a vacuum will contribute to the electric density p in 


Maxwell’s equation 
: div @ = —4rp. 


Thus there will be a contribution —e for each occupied state of positive energy 
and a contribution +e for each unoccupied state of negative energy. 

The exclusion principle will operate to prevent a positive-energy electron 
ordinarily from making transitions to states of negative energy. It will still 
be possible, however, for such an electron to drop into an unoccupied state of 
negative energy. In this case we should have an electron and proton disappearing 
simultaneously, their energy being emitted in the form of radiation. Such processes 
probably actually occur in nature. 

The present theory is very symmetrical between the electrons and protons. 
The symmetry is not mathematically perfect, as may easily be verified, 
when one takes interaction between the electrons into account. This cause, 
however, hardly appears to be sufficient, according to present ideas, to account 
for the very considerable observed differences between electrons and protons, 
in particular their different masses. Possibly the solution of this difficulty will 
be found in a better understanding of the nature of interaction. 
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